UNIVERSITY OF 
 
 ILLINOIS LIBRARY 
 
 AT URBANA CHAMPAIGN 
 
 BOOKSTACKS 
 
Faculty Working Papers 
 
 \ 
 
 REVISED ERROR COMPONENTS: 
 MORE EFFICIENT ESTIMATION WITH COMBINED 
 CROSS SECTION AND TIME SERIES 
 
 Robert W. Resek 
 
 #228 
 
 College of Commerce and Business Administration 
 
 University of Illinois at Urbana-Champaign 
 

FACULTY WORKING PAPERS 
 College of Commerce and Business Administration 
 University of Illinois at Urbana -Champaign 
 January 29, 1975 
 
 REVISED ERROR COMPONENTS: 
 MORE EFFICIENT ESTIMATION WITH COMBINED 
 CROSS SECTION AND TIME SERIES 
 
 Robert W. Resek 
 
 #228 
 
1. Introduction 
 
 Estimation of combined cross-section time series models is 
 frequently carried out by error components methods as well as by co- 
 variance models. In this paper we analyze the small sample relation 
 between these estimators. In addition we propose a specific new 
 estimator, the revised error components estimator whose small sample 
 efficiency is always greater than either competing estimator. 
 
 In this work we rely heavily on previous studies of error component 
 models. Swamy and Arora [10] developed exact finite sample results and an 
 a- class of estimators. As one might expect the revised error component 
 estimator here as well as the r-class which includes it are special cases 
 of their a class. In addition readers will see the close dependence on 
 work of Nerlove [8], Balestra and Nerlove [2], Arora [1], Fuller and Battese 
 [3, 4] and Wallace and Hussain [11]. 
 
 We proceed by restating briefly known results about error components 
 models. Then we develop in the two component case the r-class of estimators 
 and the specific member which is opti ial. Alternative variance estimators 
 are considered and their relative merit determined. Finally, the three 
 component model is developed aad the results summarized. 
 
 2. Model 
 
 Consider a linear econometric model: 
 (1) y - 6o i + X6 + e 
 
 i is an (NT x 1) vector of ones, y and £ are (NT x 1) . X is 
 (NT x k) and is measured as deviations from column means so that 
 X'i - 0. $ is (k x 1). 
 
Digitized by the Internet Archive 
 
 in 2012 with funding from 
 
 University of Illinois Urbana-Champaign 
 
 http://www.archive.org/details/revisederrorcomp228rese 
 
-2- 
 
 For simplicity there are N elements of the cross section called 
 "states" and T time elements. Each state-time combination has exactly one 
 observation. Within X, y, and e, the time subscripts vary most rapidly 
 and all time observations within a state are adjacent. 
 
 The model (1) will be transformed by premultiplication by appropriate 
 transformation matrices P., P„ , etc. where each P has NT columns. These 
 will all be expressed as Kronecker products: e.g. 
 
 P - A B 
 where A has N columns and operates on states and B has T columns and operates 
 on time. Now consider the elements of e. e-m refers to state i, time j. 
 
 (2) e y - u tj + v j + , 4 
 
 where each u, v, and w is an independent normal random variable and is 
 
 independent of X. All results are conditional on X in this small sample. 
 
 2 2 2 
 The mean of each component is zero and their variances are 0,0,0. 
 
 u v w 
 
 2 
 The normality assumption is necessary with small samples to provide X and 
 
 F distributions but we show below the major results do not depend on these 
 
 distributions, For the analysis we define two new variances: 
 
 2 2 ? 
 
 (3) o, = CT + T cT 
 
 1 u v 
 
 2 2 2 
 
 a - a + N a 
 
 2 u w 
 
 2 2 2 2 
 
 It is clear that a. >_o , a_ > a . These inequalities play a very 
 
 important role below. Employing (2) and (3) 
 Ee » 
 
 (4) V - Eee* - a 2 t (I ® I) + (a 2 - a 2 )(l ® 1 J) + (a 2 - o 2 )& J8i) 
 
 U 1 U JL l. U N 
 
-3- 
 where J = ii f IsanNxNorTxT matrix of ones. The square matrices in 
 (4) are each idempotent. 
 
 The BLUE estimator of 3 is Generalized Least Squares (GLS) but this 
 
 depends on the unknown matrix V. In turn this matrix depends on three un- 
 
 2 2 2 
 
 known elements a , a, and a„. Since V need only be known to a scalar 
 u 1 2 
 
 multiple we define: 
 
 (5) Yl -oJ h\ 
 
 2 , 2 
 2 u 2 
 
 Thus we need only know y, and y_. During some of the development we will 
 consider the model as stated (the three component model). For simplicity 
 we also consider the two component model in which: 
 
 (6) cl = 0, a 2 2 = *2, y 2 = i 
 
 In this situation there is one unknown value Yi » which will be called y. 
 
 3. Useful Transformations 
 
 The estimation is facilitated y use of various ^ans formations. 
 
 Consider a T by T Helmert matrix <f> [9, p. 13]. The first column is <J>. and 
 
 -1/2 
 the remaining (T-l) columns are <j> ? . Then<J>. = T i and (J> is orthogonal. 
 
 Therefore, 
 
 W - I T ; >'<f> = I T ; (f)^ = 1; ^ = l M 
 
 *^ 1 - | J; ty 2 - I - i j; ^<J, 2 - 
 
 Now define four transformations which will be used with the two component 
 model. 
 
_4- 
 
 (7) P x = I 9 <j>[ P 3 - I 8 t^i - I 8 | J - P[P 1 
 
 p 2 = i ® <j> 2 p 4 = i e 4> 2 (j>; - i (i - | j) - p 2 p 2 
 
 where P + P, ■ I; P_ P. - 0. 
 3 4 3 4 
 
 When P. and P_ are vertically augmented they yield the orthogonal matrix I ® <}> 
 and so together provide independent data sets which convey all sample in- 
 formation. 
 
 We also define (where <J> is now N x N) . 
 
 p 5 - ^ • i - i j • i 
 
 p 6 - * 2 *- e i - a - i j) e j 
 
 so that 
 
 (8) V - a 2 I + (a 2 - a 2 ) P 3 + (a 2 - a 2 ) P 5 
 
 We will use this notation to develop initial estimates of 3 as well 
 as estimates of the unknown variances. We turn exclusively to the two 
 component model until the results in that case are developed. 
 
 (9) V = a 2 (I - P,) + a 2 P - a 2 P. + a 2 p_ 
 
 u 3 13 u4 13 
 
 (10) V" 1 = - 2 P^ + - 2 P 3 since P 3 + P 4 - I 
 
 a a., 
 
 u 1 
 
 2 2 
 
 4. Estimation of a and a, 
 
 u 1 
 
 Transform (1) by P. : 
 
 (11) ? ± y = $ o P x i + P x X 6 + ? 1 e 
 
-5- 
 
 (12) V - EP, eE , P 1 = a 2 I 
 
 V. Is N by N and ordinary least squares (OLS) yields BLUE estimates given 
 
 the transformed data. 
 
 Define 
 
 (13) Z = X'P'P X « X'P X 
 
 (14) b 1 - z" 1 X'P^y 
 
 (15) b - i'P.y/N - Zy. ./NT » Grand Mean 
 
 o l lj 
 
 (16) var b : » o^z" 1 
 
 (17) var b » a J /NT 
 
 o 1 
 
 2 
 The residual vector e is employed to estimate O- : 
 
 (18) 6* - e^/q 
 
 (19) q -■ N-l-k 
 
 *2 
 
 Q- follows a Chi Square distribution ith q degrees of freedom. Equation 
 
 (14) implies that we assume Z. nonsingular and hence positive definite. The 
 analysis can proceed without this assumption but that generalization always 
 brings unwanted confusion. The reader may develop results without it. 
 Similarly P„ may be employed (and we assume Z„ nonsingular). 
 
 (20) P 2 y - P 2 XB + P 2 e 
 
 (21) V 2 = a 2 I (NT-N) square 
 
 z 2 = x'p 2 p 2 x - x'p 4 x 
 
-6- 
 
 (22) b 2 = z'Vp^y 
 
 (23) var t> 2 = ah^ 1 
 
 2 
 The residuals e~ are employed to estimate a : 
 
 (24) a 2 u = e 2 e 2 /n 
 
 n = N(T-l) - k 
 
 2 
 As above a follows a chi square distribution. 
 
 We note that b 1 and b_ are independent and include all the sample 
 
 information. They are independent of a. and a which are also pair wise 
 
 independent. We will rely on an estimate of Y : 
 
 (25) 9-S 2 a /S\ 
 
 Y is clearly distributed F with (n, q) degrees of freedom. 
 Before proceeding, consider P,. and P, . 
 
 P 2 P 2 * P 4 P 4 so b 2 = b 4* Al8 ° e 4 = P 2 e 2 so that: 
 
 e 4 e 4 ~ e 2 P 2 P 2 e 2 " e 2 (I ® ^2' e 2 = e 2 e 2 
 
 Thus P 2 and P, yield identical ccefficient vectors and variance estimates. 
 
 The same can be shown to be true of P, and P_. 
 
 1 3 
 
 P. and P, are also identical to the direct estimation of (1) by 
 
 ordinary least squares after (N-l) state dummy variables are added. It is 
 the covariance (CV) estimate and we will refer to b_ by that name. 
 
 For the development we employ only the estimators of the variance 
 but other estimators exist which are in some ways superior and these are 
 
-7- 
 discussed below. We will now develop the Generalized Least Squares and 
 Error Components estimators based on the analysis so far. 
 
 5. Generalized Least Squares 
 
 The GLS estimator can be written employing this notation. 
 
 b = [X'V" 3 ^] " 1 [X'V'V] 
 
 
 
 X - P. X + P. X 
 3 4 
 
 Employing (10, (13), (22) 
 
 x'v' 1 * 
 
 X '(P 3 + V ( "2 P 4 + V3 )(P 3 + P A )X 
 u °1 
 
 - 2 x'p a x + - 2 x»p 3 x 
 
 v u 1 
 
 JT2 „2 Z 1 
 u 1 
 
 i Z +1 Z 
 a 2 2 a? * 
 
 U 1 
 
 K a i 
 
 Employing (5), (14), and (23) 
 
 (26) b g = [Z 2 + Y Z 1 ]' 1 [X»P 4 y + yX'P^] 
 
 (27) b - W 2 *b 2 + W*b 1 
 
 where W £ * - [Z £ + YZ^'Vp^, 
 
 W x * = [Z 2 + YZ^fV'P. 
 
 W 2 + W l * I 
 
-8- 
 
 (28) var (b g ) - a 2 [Z £ + YZ^*" 1 
 
 These are the well known results which depend on known values for 
 
 2 2 
 a , and a, or y. (27) shows that b is a weighted average of b„ 
 u 1 g 2 
 
 and b . 
 
 6. Revised Error Components 
 
 We introduce a class of estimators, the r-class, of which one 
 member will be Revised Error Components (REC) . For the r class, y in (26) 
 is replaced by ry where < r. The r class is a subset of the a class of 
 estimators discussed by Swamy and Arora. However, their purpose was the 
 broad proof of asymptotic distribution, etc. and their multiple parameters, 
 while desirable for that purpose, are not desirable here. 
 
 The r class estimator is: 
 
 (29) b r - [Z 2 + rrz l ]" 1 [Z 2 b 2 + ryZ^] 
 
 W_ b. + W. b. 
 2r 2 lr 1 
 
 where 
 
 W + W. =1 
 2r lr 
 
 E(b r |y) - w 2r Eb 2 + w lr Eb 1 = (w 2r + w lr ) 3 = 3 
 
 Hence b is conditionally unbiased for any y and r and so also is 
 
 unconditionally unbiased. The variance of b conditional on y is 
 
 (30) var(b r |y) - a 2 JZ 2 + ryZ^" 1 ^ + (r 2 y 2 /y)Z 1 ] [z 2 + tyZ^" 1 
 
-9- 
 
 Although expressions like (30) are frequently employed to 
 characterize the efficiency of (29), their use is incorrect because 
 Y is a random variable estimated in tae process. The correct matrix 
 is the expectation of (30) taken over y. 
 
 The r class is interesting because it includes other major 
 estimators as special cases. Error components (EC) has r = 1. Co- 
 variance (CV) has r = , GLS has r « 1 and y = y. Finally REC which Is 
 specified below is a member of this class . 
 
 Since every member of the r-class is unbiased we develop REC 
 by minimizing the variances. In order to proceed we need additional 
 assumptions about Z. and Z_. For the development of estimates we employ 
 an overly strict assumption and subsequently relax it. Our final assumptions 
 are more specific than we would desire but we believe that other analysis 
 including Monte Carlo analysis will show the particular estimator to be 
 more efficient than alternatives in any real case. This is because 
 there is substantial breadth to what is employable in our final assumptions. 
 
 7. Relation 3etween Z- and Z_. 
 
 For this discussion we consider three different factors entering 
 Z- and Z_. These matrices have different number of underlying observations; 
 those observations have different variation; and the pattern of variation 
 is different. 
 
 The first two elements, if considered without the third, lead to 
 our two matrices being proportional to each other. Consider the number 
 of observations: Z. derives from P. and has N state means, while Z„ 
 (from P.) has N(T-l) deviations from those means. Hence Z has (T-l) 
 
-10- 
 times the observations of Z and will be (T-l) times as large from this 
 factor above. 
 
 Second, we observe that often variation between states is larger 
 (say m times as large) as variation within states, so that each observation 
 making up Z will be m times as large as one in Z». Combining these two 
 elements Z. is m/(T-l) ■ A times the size of Z-. 
 
 The third source of variation is the "pattern of variation." Thus 
 two variables in X may be very collinear within states (with large cross 
 product in Z„), but unrelated between states. One very important reason 
 for employing error components estimation rather than covariance is the 
 full use of this information source. We temporarily restrict ourselves 
 to the first two sources of variation and make this assumption: 
 
 Al Z is proportional to Z. 
 
 (31) Z^ = XZ 2 
 
 8. Optimization of r and Efficiency Comparison 
 
 Rather than looking directly at the variance of b , we define the 
 proportion by which each variance ma 4 rix element exceeds the corresponding 
 element in the variance of the GLS estimator. Our assumption Al makes this 
 relative variance (RV) a constant scalar for each matrix. Thus for GLS 
 under Al: 
 
 (32) var(b g ) = a*(l + Xy)' 1 z' 1 
 
 For the r class: 
 
 (33) var(b r |y) = [1 + RV(b r |y)] var(b ) 
 
-11- 
 
 (34) RV(b r |y) - (1 + Xy)(l + Xr 2 y 2 /Y)/(1 + Xr?) 2 - 1 
 
 - Xyd - ry/Y) 2 /(l + Xry) 2 
 
 For CV we set r = and for EC we set r = 1: 
 
 (35) 
 
 (36) 
 
 CV: RV(b 2 ) - Xy 
 
 -2, 
 
 EC: RV(b |y) - Xy(1 - y/y) /U + A? ) 
 
 Our goal now is to minimize (34) with respect to r. The resulting 
 estimator will have its RV compared with (35) and (36). Clearly we remain 
 interested in the expected value of (34) and (36) over the distribution of 
 
 Examine the denominator of (34). Employing the inequality at (3), 
 
 2 2 
 we have a . _> a or y <_ 1. Because the true parameter satisfies this 
 
 constraint, we will apply it also to the estimate. Thus (25) is replaced 
 
 by 
 
 (37) 
 
 a 2 /5 2 
 
 u 1 
 
 if 
 if 
 
 2 ^ ^2 
 1 — u 
 
 a, < a 
 1 u 
 
 This constraint is widely employed in current estimation by Error Components, 
 Given this constraint: 
 
 (38) l£l+XrY <1 + Xr 
 
 In addition from the discussion of section 7 we know X to be fairly small 
 in general and below we set r < 1. Thus the bounds provided by the 
 inequality are relatively narrow. We note however that under relaxed 
 
-12- 
 
 assumptiois below the counterpart of X may be greater than 1. 
 Employ (38) with (34): 
 
 (39) 
 
 RV(b r |y) 
 
 < Xyd - ry/Y) 
 
 > Xy(l - r?/Y) 2 /(l + Xr) 2 
 
 A similar pair of inequalities holds for EC when we set r = 1 in (39) , 
 Rather than minimizing the relative variance over r, we will minimize 
 each limit as given by (39). Consider now the distribution of Y/Y* 
 Define its mean and variance: 
 
 (40) 
 
 y » E(Y/Y) 
 
 
 
 (T + y = E(y/Y) 
 
 Then take the expectation of (39) 
 
 (41) RV(b r ) = E [RV(bjY)] < Xy[1 - 2ry + r 2 (a 2 + y 2 ) ] 
 
 > ay[1 - 2ru + r 2 (a 2 + y 2 )]/(l + Xr) 2 
 
 The optimal r values for the upper and lower limits are: 
 (42) r x = ]i/(a 2 + y 2 ) 
 
 r 2 = (y + X)/(a 2 + y 2 +X) 
 
 Since we wish to minimize the largest possible variance, we place greatest 
 weight on the upper limit value, r. . Also we note that as X approaches zero, 
 r 2 goes to r and that r. is free of dependence on X. 
 
-13- 
 
 Thus REC, the revised error component estimator is defined as 
 the member of the r class with r = r. . The estimator is called b*. This 
 value in principle can be determined whether or not Al is true and we 
 advocate its use in general. As previously indicated we shall below in- 
 dicate other cases where it is optimal. 
 
 Under assumption Al we find these conclusions concerning EC, CV, 
 and REC: 
 
 CI. REC is more efficient than CV 
 C2. REC is known to be more efficient than EC in this sense: 
 
 a. The upper limit of the inequality (42) for REC is below 
 the upper limit for EC. 
 
 b. The lower limit of the inequality is below the lower limit 
 
 2 2 2 
 for EC as long as X < a + y . Since y > 1 in most cases, this 
 
 allows a very large X. 
 
 c. RV (b*) is strictly less than RV(b ) by (42) if 
 
 2 2 2 
 
 (1 + X) < a + y . We believe in practice b* will always be 
 
 superior to b . 
 
 2 2 
 C3. EC is more efficient than CV if a + y < 2y. It is less 
 
 2 2 2 
 
 efficient if a + y > 2(y + X) + X . Intermediate cases are uncertain. 
 
 Following section 10 we provide a specific simple example so one 
 
 can see the numerical efficiency gain. 
 
 9. Distribution of y. 
 
 The preceding results are specific and useful for any distribution 
 of y. We now turn to the specific distribution that arises from our 
 
-14- 
 
 estimates (25) and (37). The distribution stated below (25) depends on the 
 original assumption that all errors have normal distribution. In this case 
 (Y/y) was F with n and q degrees of freedom. For the present we ignore the 
 restriction that y < 1 but return to it later. We recall that n, defined 
 in (24) depends on NT and will in almost every case be fairly large, q on 
 the other hand is N-l-k and can be very small. For example, one could 
 
 have N - 7, k = 5 so q = 1. Indeed an Important reason for later con- 
 
 2 
 sidering other estimates for O. is that q here can be zero so the estimate 
 
 would not exist. 
 
 For this F distribution we have (for q > 4) 
 
 y =» q/(q-2) 
 
 a 2 + y 2 = [q 2 (n + 2)]/[(q - 2) (q - 4)n] 
 
 (43) r x = [(q - 4)n]/[q(n + 2)] 
 
 2 
 We see that the mean of F becomes infinite when q = 2 and a does 
 
 when q ■ 4. 
 
 In (38) we found that the fact that y <_ 1 was very useful. It plays 
 
 2 2 "> 
 an extremely important role here, too. Under (37) a > a , £ Y £ 1 so 
 
 < y/Y £ 1/Y* Thus the range of Y is finite and all moments therefore must 
 
 exist. The distribution is now a "truncated F" [F (n, q, F*) ] with (n, q) 
 
 degrees of freedom and an upper limit F*. By this we mean the corresponding 
 
 F for values < F < F*. There is a "spike" of F* such that P(F - F*) = 
 
 P(F j> F*) . In our situation F* = 1/y« Since y is an unknown parameter we 
 
 cannot know F* or the resulting mean or variance of F but can examine 
 
 them conditional on various F*. 
 
-15- 
 
 The moments of F (n, q, F*) were found by numerical integration 
 
 and appear in Table 1. n -*■ °° was emp'oyed since in general use, n will be 
 
 2 2 
 
 large. The truncation reduces y and (a + u ) and also increases r. It 
 
 is clear that a prior point estimate or distribution of y and hence F* is 
 
 required to select the most efficient estimator and the reader may consider 
 
 the optimal use of such prior information. We propose that a broad rule 
 
 of thumb be employed and will show where it is better than alternatives. 
 
 The rule is 
 
 (44) r - [(q + 4)nJ/[(q + 11) (n + 2)] if q < 15 
 
 [(q - 4)n]/[q(n +2)] if q > 15 
 
 This rule was devised as a simple way to approximate Table 1 when F* ■ 3 
 
 and to include the effect of n. It may be shown that it leads to a 
 
 smaller relative variance than EC for the values in the Table as long 
 
 as either F* _> 5 or q _> 4. REC is now defined as the member of the r-class 
 
 with r defined as in (44). Two conclusions can be derived. 
 
 C4. REC is superior to either EC c : CV if F* <_ 5 or q >_ 4. 
 
 If F* > 5 and q jc 3 then the r-class should be employed with appropriate 
 
 (small) value of r. 
 
 C5. EC is superior to CV if: F* ■ 2 for any q; F* - 3 and q ^ 4; F* ■ 5, 
 
 and q _> 7; F* > 5 and q >_ 8. This is found by direct application of C3 to 
 
 Table 1. Clearly the small sample properties are very dependent on the 
 
 true unknown y. Monte Carlo studies should reach various conclusions 
 
 according to the variances selected in advance for analysis. 
 
 It is important to note that in Table 1, the biggest effect on r 
 is created by F* and not the degrees of freedom. This implies intuitively 
 
Table 1 
 
 Truncated F Distribution 
 Mean, Variance, Optimal r 
 Numerator Degrees of Freedom -i 
 
 Denominator 
 Degrees of 
 Freedom, q 
 
 
 Truncation s 
 
 = F* 
 
 2. 
 
 3. 
 
 5. 
 
 1.44 
 
 1.92 
 
 2.69 
 
 2.54 
 
 4.90 
 
 11.01 
 
 .57 
 
 .39 
 
 .24 
 
 1.35 
 
 1.68 
 
 2.13 
 
 2.23 
 
 3.87 
 
 7.41 
 
 .60 
 
 .43 
 
 .29 
 
 1.30 
 
 1.55 
 
 1.83 
 
 2.05 
 
 3.29 
 
 5.52 
 
 .63 
 
 .47 
 
 .33 
 
 1.26 
 
 1.46 
 
 1.65 
 
 1.93 
 
 2.89 
 
 4.36 
 
 .65 
 
 .51 
 
 .38 
 
 1.24 
 
 1.39 
 
 1.52 
 
 1.84 
 
 2.60 
 
 3.59 
 
 .67 
 
 .53 
 
 .42 
 
 1.22 
 
 1.34 
 
 1.43 
 
 1.77 
 
 2.38 
 
 3.05 
 
 .69 
 
 .56 
 
 .47 
 
 1.20 
 
 1.30 
 
 1.37 
 
 1.71 
 
 2.20 
 
 2.67 
 
 .70 
 
 .59 
 
 .51 
 
 1.19 
 
 1.27 
 
 1.32 
 
 1.65 
 
 2.06 
 
 2.39 
 
 .72 
 
 .62 
 
 .55 
 
 10. 
 
 1. y 
 
 G + ]1 
 
 2. y 
 
 2 2 
 
 cr + y 
 
 3. y 
 
 2 2 
 a + y 
 
 4. y 
 
 2 2 
 
 a + y 
 
 5. y 
 
 ~ 2 2 
 
 6. y 
 
 2 2 
 
 a + y 
 
 7. y 
 
 „2 , 2 
 
 a + y 
 
 8. y 
 
 „ 2 JL 2 
 
 a + y 
 
 4.14 
 
 32.35 
 
 .13 
 
 2.78 
 
 16.82 
 
 .16 
 
 2.16 
 
 10.10 
 
 .21 
 
 1.82 
 
 6.72 
 
 .27 
 
 1.62 
 
 4.87 
 
 .33 
 
 1.49 
 
 3.78 
 
 .39 
 
 1.40 
 3.10 
 
 .45 
 
 1.33 
 
 2.66 
 
 .50 
 
 * 
 * 
 
 
 * 
 * 
 
 
 3.00 
 * 
 
 
 2.00 
 
 
 
 1.67 
 
 8.33 
 
 .20 
 
 1.50 
 
 4.50 
 
 .33 
 
 1.40 
 
 3.27 
 
 .42 
 
 1.33 
 
 2.67 
 
 .50 
 
 // 2 2, 
 r - \i/(o + y ) 
 
 infinite 
 
-16- 
 
 that the results will not materially be altered if the underlying distribution 
 is different from F as long as the truncation is present. 
 
 We have at this point developed the REC estimator and have shown it 
 to be broadly superior to EC and to CV. It remains to expand the analysis 
 
 in three ways. These are first to relax the initial assumption Al; second 
 
 2 2 
 to consider other estimators of a, and O ; and third to present the estimators 
 
 1 u 
 
 for a three component model. 
 
 10. Alternative Assumptions about Z. 
 
 The analysis above depends on Al and we now discuss a variety of 
 means by which comparable results may be achieved. While the family 
 ultimately outlined here is not all inclusive, it contains a wide family 
 of different kinds of situations. Initially we will state the major 
 alternative assumption and then, since its meaning may not be instantly 
 clear we will consider specific situations under which it will arise. 
 
 For the matrices Z. or Z_ we may write eigen vectors and eigen 
 values in a matrix equation; e.g. 
 
 Z 2 A 2 * A 2 A 2 
 where A- Is an orthogonal matrix of eigen vectors and A« a diagonal matrix 
 of eigen values. k~ ^y not ^ e unique because its columns may be permuted 
 along with elements of A„ and if there are multiple eigen values, there 
 are multiple solutions. We assume the following: 
 
 A2 For the matrices Z_ and Z there exist matrices of eigen vectors 
 A- and A. such that A = A = A. It is permissible to perform a single 
 variance transformation on both Z matrices before finding eigen vectors. 
 
-17- 
 
 One way of achieving this is assumption Al above. Another direct 
 way is the following: 
 
 A3 Z. and Z„ are diagonal matrices. This would arise when all the 
 Xs are uncorrelated both within states and between states. Incidentally, 
 this clearly includes the case where there is a single regressor. 
 
 Other possibilities are best approached by first performing a 
 variance transformation on Z. and Z~ as mentioned in the assumption. Form 
 the diagonal matrix Q such that each diagonal element is the square root 
 of the reciprocal of the diagonal of Z_, i.e. 
 
 (45) Q 1± = Z a 
 
 -1/2 
 
 q ±j - o tyj 
 
 Define 
 
 (46) Z* » QZ 2 Q 
 
 Z* - QZ^Q 
 
 It is clear that by construction, the diagonal of Z* consists of ones. 
 We assume: 
 
 A4 The diagonal elements of Z* are equal to each other. The im- 
 plication of this is that variances in Z are proportional to those in 
 Zj. This transformation will be made before A2 is applied. Its use, 
 as seen below, is to transform both X and 6 without change of the content 
 of the analysis. Along with A4, anyone of the following is sufficient for 
 A2. 
 
-18- 
 
 A5 There are two explanatory variables In the model (k » 2) . 
 
 A6 Z* and Z* have a single constant every where off the main diagonal 
 
 (but these two may be different) . 
 
 A7 Z* and Zt are. each circular symmetric matrices. 
 
 A8 Z* and Z* are each tridiagonal matrices. 
 
 See Press [9] for specific discussion of the latter two assumptions. 
 Another possibility is that Z. and Z. are each conformably block diagonal 
 with separate blocks satisfying different assumptions above. In general 
 terms these assumptions say that one makes a single variance transformation 
 to Z and Z and the results are pattern matrices of a single type but 
 different coefficients. 
 
 Now we proceed to assume A4 and A2 are consecutively employed. 
 
 (47) Z* A - (QZ 2 Q) A - AA 2 
 Z* A «= (QZjQ) A - kh 
 
 Define M as the reciprocal square root of A ? so that: 
 
 (48) M 2 A 2 = I 
 Reconsider the original model (1) . 
 
 (49) Y = 3i + X3 + e 
 
 o 
 
 =• $ i + XQAM(QAM)"" 1 g + £ 
 o 
 
 Define 
 
 (50) 6 - (QAM)" 1 ^ so that 
 
 (51) Y » $ i + (XQAM)6 + e 
 
-19- 
 We will now examine the variance matrix of the transformed coefficient 
 vector <5. This will be minimized in the sense that alternative estimators 
 will have variance matrices which dit :er by a positive semi-definite 
 matrix indicating the results will also hold for estimates of 0. 
 
 The transformation QAM alters the matrices Z. and Z_. These 
 become: 
 
 (52) Z** - M'A'Q'Z QAM = I 
 
 Z** = M'A'Q'Z^QAM - MA^ - K A' 1 - A 
 
 A is a diagonal matrix where the i th diagonal element is the 
 ratio of the eigen value of Z_ to the corresponding eigen value of Z ? . 
 
 (53) X t - X U A 21 
 
 We proceed directly to the variance of b (30) where b is now 
 an estimate of 6. 
 
 (54) var(b r |y) - a* [I + r?A]" 1 [I + (r 2 y 2 /Y)A][I + ryA]"" 1 
 
 Clearly all off diagonal elements (covariances) are zero and 
 diagonal elements may be written: 
 
 (55) var(b ri |y) = a 2 [l + X^f^l + (v 2 f/y)X ± ) 
 
 Previously this depended on a scalar X for the entire matrix 
 but now it depends on A.. Relative variance continues to be as in (33) 
 but is unique for each parameter in 6*. 
 
 2,,, , *.2 
 
 (56) RV(b rl |y) = X lY (1-ry/Y) /(l + X ± ry)' 
 
-20- 
 
 This result is identical to (34) except that the subscript is 
 added to X. Our justification for choosing the upper limit in (42) to 
 determine r is strengthened because we are free of dependence on X. 
 All subsequent results hold in their entirety but now apply to each 
 parameter being estimated. CI can be restated as follows: 
 
 The variance of the CV estimator of 6 exceeds that of the REC 
 estimator by a positive semi-definite matrix. Since this difference is 
 true for estimator of <5, it is also true of the comparable estimators of 
 B. Each other conclusion has a comparable interpretation showing in 
 particular REC to be better than the alternatives. We should note that 
 in some cases X may be fairly large leading to wide inequalities in (42) 
 but also leading larger relative efficiency differences. 
 
 An example is presented to make the potential efficiency gain 
 more real. This example falls under A4 and A5. 
 
 X*P X » Z £ » 
 
 16 -36 
 -36 100 
 
 x'p 3 x - z 1 - 
 
 4 8 
 8 25 
 
 ) 
 
 The only restriction is that variances are proportional: i.e. 16/100 ■ 4/25 
 The example was selected so the pattern of variation differs substantially, 
 that is, the cross products are large negative and positive numbers 
 (-36 and +8). The variance transformation is performed and then eigen 
 vectors found: 
 
 .25 
 .10 
 
 1_ 
 
 \ 
 
 i l 
 l -l 
 
-21- 
 The eigen values are: 
 
 X 2± - [ .1 1.9] A 1± = [.45 .05] 
 
 A = 4.5 A 2 = .02631 
 
 Assume also y - 1/3 q = 5 n = 98 
 
 Then by (44) r = .551 
 
 For & 1 RV(CV) =1.5 
 
 RV(EC) < 1.5 (.82) - 1.23 
 RV(REC) < 1.5 (.26) = .39 
 
 For 6 2 RV(CV) - .0088 
 RV(EC) _< .0072 
 RV(REC) < .0022 
 
 The figures indicate that there is little difference in the three 
 estimators relative to 6 ? but substantial difference in 6 . For <$.. , CV has 
 variance that is 150% larger than GLS ith known variance matrix while 
 EC has variance up to 123% larger and REC up to 39% larger. Thus the 
 variance of CV is 12% greater than that of EC and 80% greater than REC. 
 For 6_, the ordering is the same but REC is only known to be 0.66% 
 superior to CV. 
 
 This concludes the discussion of alternative assumptions about Z. 
 We turn in the next section to alternative variance estimates. 
 
-22- 
 
 11. Alternative Variance Estimates. 
 
 2 2 
 The estimates above of a. and a (18) and (23) are not the only 
 
 such estimates. Arora [1] employs those given here. Fuller and Battese 
 
 *2 2 
 
 [3, 4] use a but for a. employs the estimator below, while Wallace and 
 
 Hussain [11] use estimates which are each different. We will consider the 
 
 merits of the major alternative forms. 
 
 A 2 ^2 
 Recall that a., and a are advantageous because they are each chi 
 
 square with q and n degrees of freedom respectively. Their disadvantage 
 
 which we will show, is that they "lose " more degrees of freedom than the 
 
 model requires and hence are, in certain cases, inefficient. This is of 
 
 importance particularly when there are few (or zero) degrees of freedom. 
 
 In the following we need some known results concerning quadratic forms. 
 
 2 
 
 1. Assume X is N(0,a I) and A is idempotent of rank r. 
 
 Then 
 
 Q - X'AX ^ Chi square (r) 
 
 Its cumulants are [6, p 168] 
 
 (57) K 8 (Q) - C s • r 
 
 where C g - 2 s "" 1 (s-1)! 
 
 The cumulants convey the same information as moments [5, p. 20] 
 The first four are: 
 
 K l = w l ; K 2 = ° ; K 3 " y 3 ; K 4 = P 4 " ° 
 
 Where y,. and y. are the third and fourth moments about the mean. 
 3 4 
 
 All cumulants for s > 3 in a normal distribution are zero. 
 
 2. Assume X^N(0,V) and A is a general symmetric matrix. Then 
 Q ■ x'Ax has the following cumulants [7, p. 153]. 
 
-23- 
 
 (58) K g (Q) = C s EX^ = C s tr (AV) S 
 
 where the X.. are the eigen values of AV and tr (AV) is its trace. 
 3. Assume X^N(0,V) and A and B are general quadratic forms. Then 
 
 covar(X'AX, X'BX) - 2 tr (AVBV) . 
 
 This is obvious from the variance of X' (A + B)X. 
 
 Now we will examine variance estimates of the type employed by 
 Wallace and Hussain [11]. First employ ordinary least squares to estimate 
 (1) and find the residual vector e. 
 
 e - Me = [I - (-J 8 h) ~ X(X , X)" 1 X]e 
 n t 
 
 Define 
 
 Q° • e'P 3 e = e'MP 3 Me 
 
 Q? - e'P.e - e'MP.Me 
 4 4 4 
 
 The distribution of Q. and Q, depend on 
 Q 3 - P 3 MVMP 3 
 
 Q 4 = P 4 MVMP 4 
 
 Where these are symmetric matrices which serve the role of AV in (58) 
 First examine Q,: 
 
 Q 4 - ofo - K°) + (aj - *y° 3l> 
 
-24- 
 
 where K° * P i X(X , X)" 1 X f P 1 
 
 K ± - (X , X)~ L X , P i X 
 
 k. ■ tr K " tr K° 
 ill 
 
 K . . = K.K etc. for any sequence of subscripts 
 
 r s 
 k. is k with r subscripts each equal to i followed by 
 
 ^ s each equal to j. 
 K= (Z. + Z_) Z. and is non negative definite. Hence 
 
 k i 1 ° 
 
 k. + k. = k 
 3 A 
 
 k + k», - k_ etc. for any complementary pair. 
 
 Employing eigen values one may show 
 
 k 33 > k 3 2 /k . 
 
 k 34 i k 3 V k 
 
 We may proceed further under the assumptions employed above. 
 Assume Al : 
 
 k 3 r 4 P - u r (l - y) p k p = X/(l + X) 
 
 Assume A2 : 
 
 k 3 r 4 P = Zy ± r (l - Pi ) p U± - X ± /(l + X ± ) 
 
-25- 
 
 2 
 12. Estimate of . 
 u 
 
 Define 
 
 (59) a* 2 = Q.%11, 
 
 u 4 4 
 
 (60) m 4 - N(T - 1) - k 4 
 
 (61) Ea* 2 = a 2 (l + yk 34 /m 4 ) 
 
 (62) y - (a x 2 /a 2 ) - 1 
 
 (63) var a* 2 = 2a 4 — [1 - — {k_. - 2yk- . , - yV,,,}] 
 
 u urn, m. 34 343 3433 
 4 4 
 
 (61) and (63) show the dependence of this estimators on y and therefore on 
 
 2 2 
 
 Q,/<7 . Hence we must assume y is small. Given this and the small size of 
 1 u 
 
 2 
 k_, we note that a* is approximately unbiased and the last term in the 
 
 variance equation is 1. If y is small the cumulants are: 
 
 (64) Ks (o* 2 u ) - c s a 2 u s , 4 1 - 3 [1 - J > (k - k 3 s ) + o(k 34 / m4 ) 1 
 
 4 
 
 The last term signifies terms of order at most k_,/m, . 
 Deleting that term 
 
 (65) k (a* 2 ) < C a ^m, 1 " 3 
 
 s u — s u 4 
 
 which are the cumulants of chi square with m, degrees of freedom. Hence 
 
 2 
 
 a* is superior, in the sense of cumulants, to such a distribution and 
 
 hence also to a . 
 u 
 
 The smallness of y can be made more specific by employing A2 . 
 
 In that case 
 
 k 
 
 (66) K s (a* 2 ) = C s a 2s m 4 1_s [l + ^-{-k + Zy^d + y - yy^ 3 }] 
 
 4 i 
 
-26- 
 
 2 
 This is superior to x ( m A ) if the term in { } is negative for all s. 
 
 In turn this is true if each element Ln the sum is less than or equal to 
 
 1 or: 
 
 1 > pj(l + y - yu ± ) S 
 
 y < 1/U ± 
 
 This is also a necessary condition. In our example above 
 X = (4.5, .026) so y ■ (.82, .075) 
 
 thus 
 
 y < 1.22 or a 2 < 2.22 a 2 or y > .45. 
 — 1 — u — 
 
 2 2 
 In every case under A2, O. < 2a will suffice so that all cumulants of 
 
 2 ~2 
 
 a* are less than those of a . 
 u u 
 
 In many cases under Al or A2, the largest X will be, say 1/10. 
 
 2 
 In this case u ■ 1/11 and y < 11 is sufficient for a* to be superior to 
 
 — u 
 
 /s2 2 
 
 a . Therefore, we conclude that O* is the superior estimator in cumulant 
 u u r 
 
 2 2 
 if either 0~/o is small or the largest root A is small. 
 1 u ° 
 
 2 
 Despite the advantages of a* , the gain will in almost every case 
 
 be small because there are many degrees of freedom. Hence it seems wise 
 
 "2 
 
 to generally employ a . 
 
 2 
 13. Estimates of a. . 
 
 Q 3 = a/ [(I - $D 8 1 J - 2K 3 ° + K 33 ] + o\ [K 3 ° - K 3 °] 
 
-27- 
 
 Define 
 
 (67) a* 2 - [Q 3 - a\ 3A ]/m 3 
 
 (68) m 3 = N - 1 - k + k 44 
 
 This is the estimator of Fuller and Battese. For simplicity one 
 could employ Q /(N-l) or Q /nu but these require assumptions that a are 
 
 S S J u 
 
 small, etc. Cumulants can be found generally: 
 
 (69) K 8 (CX* 2 ) - C^m/^U - ^-(k 4A - k/ 8 ) + 0( Y k 34 /m 3 )3. 
 
 The last term depends on m,. which may not be very small but also has 
 
 2 2 
 Y " O /o. which is always less than one. (69) includes the effect of 
 
 *2 
 
 a . To make (69) clearer we now employ A2. Under it: 
 
 (70) < S ^J 2 ) 
 
 - c s a 2s m 3 1_s [i - ± z (i - p ± ) 2 {i - (i - y ± ) s " 2 (i - v ± + Yy ± ) 8 } 
 + ± (N r T - i) - k)" s (- Y k 3A ) 8 ] 
 
 (71) k (a* 2 ) < CoJV 1 " 8 . 
 
 8 1 — S 1 J 
 
 The inequality in (71) is true for s >_ 3 without any additional assumptions. 
 When s ■ 2, the requirement is that Y be a small amount less than one. 
 One form of requirement for (71) to hold is: 
 
 2[N(T - 1) - k] > k/(l - Y ) 1/2 
 
 For example if Y ■ .99 and k = 4, then (71) is true if N(T -1) > 24. 
 
■28- 
 
 This depends on the fact that at least one p. <_ 1/2. Our real Interest 
 
 A 2 
 
 Is var (a. ) and one can show 
 
 var a* £ var (a- ) 
 
 If N(T - 1) - k > 1/4 k 
 
 which will of course always be true. 
 
 2 
 We have shown that o* is cumulant superior to chi square m_ as 
 
 A 2 
 well as to a. and hence should always be employed. Clearly there will be 
 
 2 
 some cases with small N where a* exists and the alternative does not. 
 
 We will employ the chi square distribution as an approximation 
 
 as we return to REC but we note that the truncation of the F distribution 
 
 means the exact distribution is not required in any event. Thus we propose 
 
 that REC be employed as above but with y* - a /a* replacing Y* The new 
 
 degrees of freedom are used in defining r. (m, replaced q.) 
 
 * 2 
 In this section we have shown that the estimates of a., of Fuller 
 
 and Battese are superior to those of Arora but that both serve well in the 
 
 REC estimator. 
 
 14. Three Component REC Model 
 
 Now consider the original model with variance matrix given by (4) . 
 
 2 2 
 The analysis was developed with CT- ■ a but that assumption is now 
 
 dropped and the third component restored. We employ four transformations 
 
 defined as in section 3. These are 
 
 (72) P=P«P P=»P«P 
 K J 35 3 5 36 r 3 6 
 
 P,, = P. • P. P., = P. • P, 
 45 4 5 46 4 6 
 
-29- 
 One may show 
 
 P 34 + P 45 " P 5 : P 36 + P 46 = 6 ; P 35 + P 36 " P 3 ; etc ' 
 When the model is transformed by V - we have the grand mean for all 
 
 ob 
 
 servations - which is zero within X. P,_ yields mean values for each 
 time period; P, fi yields mean values for each state and P., yields 
 deviations from both the state mean and time mean. 
 
 When regressions are performed on each transformed model in turn: 
 P^c yields only the estimate of 6 
 
 P, 6 yields b, and a new $ which has [(N - 1) (T - 1) - k] 
 
 degrees of freedom. 
 
 ~2 ^2 
 
 P_- yields h. and a. which are identical to b, and cr- above. 
 
 P,,. yields b,. and cL where the latter is Chi Square with T - 1 - k 
 
 degrees of freedom. 
 
 2 2 
 We also could define 0* and a* as was done in the preceeding 
 
 section and which would have the same implications. All of the estimators 
 
 under consideration are now weighted averages of b. , b,, and b_: 
 
-30- 
 
 Define the following. 
 
 '1 u 1 2 u 2 
 
 Z. - X'P,P C X Z c - X'P.P,X 
 4 4 5 5 4 5 
 
 Z. = X'P P,X - X'P-X 
 1 Jo J 
 
 Z A + Z 5 - Z 2 ' 
 
 W • Z, + r Y-.Z. + r, Yo z c 
 
 4 a'l 1 b'2 5 
 
 W, = W z. 
 
 4 o 4 
 
 W = r Y,W "* 1 Z 1 
 
 1 a 1 o 1 
 
 W_ => r, Y W ~ 1 Zc 
 
 5 b 2 o 5 
 
 Then the r class estimator is defined as: 
 
 (73) b - W.b, + W.b, + W^b,. 
 
 r 44 11 55 
 
 This estimator is again identical to the a class of Swamy and 
 Arora. Indeed the entire 7 parameter a class can be expressed as part of 
 the two parameter r class if we set: 
 
 (74) r fl = a /t a 6 ^ a i + a 2^1^ 
 r fe - a 3 /[a 6 (a 4 + a^)] 
 
 Although the a notation was desirable for the purposes of that 
 article, the r notation is simpler and more sensible for general estimation 
 purposes. 
 
-31- 
 
 There are many specific estimators available for a 3 component 
 model such as this one. For each of the two ratios Y-t and Yn we ^y employ 
 EC, CV, REC, or omit the component and return to a 2 component model. These 
 imply that r ■ 1, r = 0, r * r*, or r Y, s 1. Since the same four options 
 
 S. 3, 3. 3. _L 
 
 are available for the two components there are 4 x 4 = 16 different specifi- 
 cations clearly available. 
 
 The variance of b is : 
 
 r 
 
 (75) varog^, Y 2 > - °\~hz A + Cr.VV*! + '^Vo" 1 
 
 We now proceed directly to an assumption about Z similar to that 
 made above: 
 
 A9 After a single variance transformation Q is applied to Z_, Z- and Z, 
 such that Z* - QZ.Q etc. there exist eigen value matrices for the transformed 
 Z that are identical. 
 
 This is applied as above to transform the parameter vector to a new 
 
 vector <$. The eigen values are contained in A.. A, and A_. 
 
 4 15 
 
 Define 
 
 X ai = X li /X 4ij \± " X 5i /A 4i 
 
 Then the relative variance of b J may be shown to be: 
 
 ri 
 
 (76) WCbJV Y 2 ) - Q*/(l + X^ ♦ ^rjff 
 
 < Q* 
 
 Q* - X al y l(1 - r aYl / Yl ) 2 + ^(1 - r a Y 2 /Y 2 ) 2 
 + X aAi Y lV r aV Y l " r bW 2 
 
-32- 
 
 9 9 
 
 y and a. are the mean and variance of (Y-j/Y-j); P ? and a ? are the mean and 
 
 * ^2 A 2 ^2 
 
 variance of (y~/y ? )'> and a ? is the rovariance. If a , a and a 2 are 
 
 o ^ 
 
 respectively chi square with n , q , and q ? degrees of freedom then Yi and 
 
 Y are each F with common numerator but independent denominator. If 
 
 these distributions are not truncated then: 
 
 (77) a 12 = (2/n°) u^ 
 
 Ihis value will be small when n is large. 
 
 We minimize Q* over r and r, . The optimum value for r* after some 
 x a b a 
 
 computation is found to be: 
 
 *bi Y 2 
 
 (78) r* = r, li- 
 
 st. 1 1 + 
 
 \±^: 
 
 1 - r* (y 2 + a 12 / Ul ) 
 
 r is the optimal r value for the two component model defined in (42) . 
 
 A similar equation holds for r*. It is seen from this that r* is approximated 
 
 by r 1 if (X, . Y 2 ) is small or if q„ is not small (so that r* is approximately 
 
 one. Because this approximation exists we define the three component REC 
 
 estimator as the r class with r = r and r. ■ r„. 
 
 a i b 2 
 
 It remains for us to contrast the efficiency of the alternative 
 estimators with the three component model. We indicated below (74) the 
 variety of estimation methods available. The selection of estimation 
 method depends on the true value of Y-i or Y?« Hence the most efficient 
 method is 2 components if Y is close to one. CV is adviseable if q is very 
 small or X large and EC is good if q is large and Y not close to one. 
 
 The most efficient estimator In every case is REC with appropriately 
 chosen r. In general use one may choose r by the rules given in (42) but 
 
-33- 
 
 shift to another estimator is special cases indicated by the previous 
 paragraph. The general use of any other estimator is clearly inefficient. 
 We have developed the r class of estimators and have shown that a 
 specific member, REC, should be used in essentially all error component 
 problems. In the process we have developed substantial information about 
 small sample properties of error components, covariance, and REC estimators, 
 Further theoretical and Monte Carlo analysis should broaden the specific 
 assumptions and results. 
 
References 
 
 [1] Arora, S. S., "Error Components Regression Models and Their 
 Applications", Annals of Economic and Social Measurement , 2, 
 October, 1973, 451-61. 
 
 [2] Balestra, P. and N. Nerlove, "Pooling Cross-Section and Time Series 
 Data in the Estimation of a Dynamic Model: The Demand for Natural 
 Gas". Econometrics , 34, July, 1966, 585-612. 
 
 [3] Fuller, W. A. and G. E. Battese, "Transformations for Estimation of 
 Linear Models and Nested-Error Structure", Journal of the American 
 Statistical Association , 68, September, 1973, 626-32. 
 
 [4] Fuller, W. A. and G. E. Battese, "Estimation of Linear Models with 
 Crossed-Error Structure". Journal of Econometrics , 2, March, 1974. 
 
 [5] Johnson, Norman L. and Samuel Kotz, Discrete Distributions , 
 Boston, 1969. 
 
 [6] Johnson, Norman L. and Samuel Katz, Continuous Univariate 
 Distributions - 1, Boston, 1970. 
 
 [7] Johnson, Norman L. and Samuel Katz, Continuous Univariate 
 Distributions - 2, Boston, 1970. 
 
 [8] Nerlove, M. , "Further Evidence on the Estimation of Dynamic 
 Economic Relations from a Time Series of Cross Sections", 
 Econometrica , 39, March, 1971, 359-82. 
 
 [9] Press, S. James, Applied Multi v ariate Analysis , New York, 19 72. 
 
 [10] Swamy, P.A.V.B., and S. S. Arora, "The Exact Finite Sample 
 
 Properties of the Estimators of Coefficients in the Error Components 
 Regression Models", Econometrica , 40, March, 1972, 253-60. 
 
 [11] Wallace, T. D. and A. Hussain, "The Use of Error Components Models 
 in Combining Cross Section with Time Series Data", Econometrica , 
 37, January, 1969. 
 
>0 uNo e ^