CHAPTER
19
The linear regression model I — specification,
estimation and testing
19.1
Introduction
The linear regression model forms the backbone of most other statistical
models of particular interest in econometrics. A sound understanding of the
specification, estimation, testing and prediction in the linear regression
model holds the key to a better understanding of the other statistical models
discussed in the present book.
In relation to the Gauss linear model discussed in Chapter 18, apart from
some apparent similarity in the notation and the mathematical
manipulations involved in the statistical analysis, the linear regression
model purports to model a very different situation from the one envisaged
by the former. In particular the Gauss linear model could be considered to
be the appropriate statistical model for analysing estimable models of the
form
M,=%+a,t+
M,=ws+
3
3
¡=1
=1
Y cOint Y diQiat.
k
} dit’,
¡=1
(19.1)
(19.2)
where M, refers to money and Q,,, i= 1,2, 3 to quarterly dummy variables,
in view of the non-stochastic nature of the x,,s involved. On the other hand,
estimable models such as
M = AY*®P2y®,
(19.3)
referring to a demand for money function (M ~ money, Ÿ - income, P ~ price
level, J — interest rate), could not be analysed in the context of the Gauss
369
370
Specification, estimation and testing
linear model. This is because it is rather arbitrary to discriminate on
probabilistic grounds between the variable giving rise to the observed data
chosen for M and those for Y, P and J. For estimable models such as (3) the
linear regression model as sketched in Chapter 17 seems more appropriate,
especially if the observed data chosen do not exhibit time dependence. This
will become clearer in the present chapter after the specification of the linear
regression model in Section 19.2. The money demand function (3) is used to
illustrate the various concepts and results introduced throughout this
chapter.
19.2
Specification
Let {Z,, t¢ T} bea vector stochastic process on the probability space (S, F
P(-) where Z, =(y,,X;)’ represents the vector of random variables giving rise
to the observed data chosen, with y, being the variable whose behaviour we
are aiming to explain. The stochastic process {Z,,teT} is assumed to be
normal, independent and identically distributed (NUD) with E(Z,)=m and
Cov(Z,=Z. Le.
Vt
(x)
N
my\(O11
xi
12
(19.4)
rel
2?)
in an obvious notation (see Chapter 15). It is interesting to note at this stage
that these assumptions seem rather restrictive for most economic data in
general and time-series in particular.
On the basis of the assumption that {Z,, te T} isa NIID vector stochastic
process we can proceed to reduce the joint distribution D(Z,,..., Z;; w) in
order to define the statistical GM of the linear regression model using the
general form
where
and
V=n+u,,
ted,
H,= E(y,/X,;=x,)_
(19.5)
is the systematic component,
u,=y,—E(y,/X,=x,)
the non-systematic
component
(see Chapter 17). In view of the normality of {Z,,t¢1T! we deduce that
H, = E(y,/X,=X,)= Bo + Bx,
(Hnear in x,),
(19.6)
where
and
-1
ủo=my,ứ;,>;ym,,
_đâ-=l1
=E;72Ă
Var(u,/X,=x,)= Vat(y,X,=x,)=øơ”
(homoskedastic),
(19.7)
19.2
Specification
where ø?=ø¡i¡—øizŠ;2ø¿;
371
(see Chapter 15). The time inoariance of the
parameters Øạ, B and oa? stems from the identically distributed (ID)
assumption related to {Z,,t¢ 1}. It is important, however, to note that the
ID assumption provides only a sufficient condition for the time invariance
of the statistical parameters.
In order to simplify the notation let us assume the m=0 without any loss
of generality given that we can easily transform the original variables in
mean derivation form (y,—m,) and (X,—m,). This implies that Bp, the
coefficient of the constant, is zero and the systematic component becomes
E(y,/X, = X,) = B’X,.
(19.8)
In practice, however, unless the observed data are in mean deviation form
the constant should never be dropped because the estimates derived
ores
are not estimates of the regression coefficients B= 2
¢,, but of
B*
= E(X,X/) | E(X/y,}; see Appendix 19.1 on the role of the constant.
The stattical GM of the linear regression model takes the particular
form
y,=Bx,+u,,
teT,
(19.9)
with 0 =(B, o”) being the statistical parameters of interest; the parameters in
terms of which the statistical GM is defined. By construction the systematic
and non-systematic components of (9) satisfy the following properties:
(i)
E(u,/X, = X;)
= E[U,— E(y,/X,= X,))/X
= E(y,/X, = x,) — FX,
(ii)
==x,]
= X,) =0;
E(u,u,/X, = X,)
= E[(y, — Ely, /X,=x,))(ys — E/X= X,))/X, =X]
— pm, t=s
0,
(iti)
t#s;
E(u,u,/X, = X,) = 1, E(u,/X,=x,)=0,
tseT.
The first two properties define {u,, t€ T} to be a white-noise process and (iii)
establishes the orthogonality of the two components. It is important to note
that the above expectation operator E(-/X,=x,) is defined in terms of
D(y,/X,; 6), which is the distribution underlying the probability model for
(9). However, the above properties hold for E(-) defined in terms of D(Z,; w)
as weil, given that:
(i)’
Elu,) = Ey E(u,/X,=x,)} =0;
(ii)
E(u,u,) = EY E(uu,/X,=x,)} =
a,
t=s
0,
t#s;
372
Specification, estimation and testing
and
(„
E(uu,)= Et(Euu,X,=x,)}=0,
ĐseT
(see Section 7.2 on conditional expectation).
The conditional distribution D(y,/X,; 0) 1s related to the joint distribution
D(y,, X„; ý) via the decomposition
DỤ,,X„;
/⁄)=D(yX,
W,)
D(X,;
Wf)
(19.10)
(see Chapter 5). Given that in defining the probability model of the linear
regression model as based on D(},/X,; 8) we choose to ignore D(X,; ;) for
the estimation of the statistical parameters of interest @. For this to be
possible we need to ensure that X, is weakly exogenous with respect to 0 for
the sample period
t=1, 2,...,
T (see Section
19.3, below).
For the statistical parameters of interest 0 =(B, a7) to be well defined we
need to ensure that £,, is non-singular, in view of the formulae B= Z3;'02,,
67 =6,;
— 6,273 6>,, at least for the sample period t= 1, 2,..., T: This
requires that the sample equivalent of £,,,(1/T)(X’X) where X =(x,,X),
pe
x)’ is indeed non-singular, i.e.
rank(X’X) = rank(X) =k,
X, being a k x | vector.
As argued in Chapter
(19.11)
17, the statistical parameters of interest do not
necessarily coincide with the theoretical parameters of interest €. We need,
however, to ensure that € is uniquely defined in terms of 6 for € to be
identifiable. In constructing empirical econometric models we proceed from
a well-defined estimated statistical GM (see Chapter 22) to reparametrise it
in terms of the theoretical parameters of interest. Any restrictions induced
by the reparametrisation, however, should be tested for their validity. For
this reason no a priori restrictions are imposed on @ at the outset to make
such restrictions testable at a later stage.
As argued above, the probability model underlying (9) is defined in terms
of D(y,/X,; 8) and takes the form
=| Dux:
9-5
1
am
exp
5g
1
py"
0cñ*xR.,
ret},
(19.12)
Moreover, in view of the independence of {Z,, te 7} the sampling model
takes the form of an independent sample, y=(y,,...,7)’, sequentially
drawn from D(y,/X,; 8), t=1,2,..., T, respectively.
Having defined all three components of the linear regression model let us
19.2
Specification
collect all the assumptions
properly.
373
together
specify
and
statistical
the
model
The linear regression model: specification
(1)
Statistical GM, y,= Bx, +u,, te T
[1]
tt, = E(y,/X,=x,) — the systematic component; u, = y, ~ Ely,/X,= X,)
— the non-systematic component.
[2]
are the statistical
0=(B, 07). B=Xz76n;, 07 =0,,—G 174374,
[3]
[4]
[51
parameters of interest. (Note: Z),=Cov(X,), 62; =Cov(X,, y,),
Øi¡= Vat(y,).)
X, is weakly exogenous with respect to #,r=1,2,...,T.
No a priori information on 8.
Rank(X)=k, X=(x,, X2,..., X7); Tx k data matrix, (T'>k).
(II)
Probability model
6=
_
Tw a.
|PUu/XzØ)=
1
21.)
9P|
1
20)
.=
ÿx,)
|
0=(ÿ.ø”)cửxIR,, ret}
[6]
[7]
()
(11)
(ili)
@ is time
D(y,/X„; 9) is normal;
E(y,/X,=X,)
= B’x, — linear in x,;
=o? — homoskedastic (free of x,);
Var(y,/X,=X,)
(IIT)
Sampling model
invariant.
y=(\;,..., yy) represents an independent sample sequentially
drawn from D(y,/X,; 9), t= 1, 2,....TAn important point to note about the above specification is that the
model is specified directly in terms of D(y,/X,; 6) making no assumptions
about D(Z,; w). For the specification of the linear regression model there is
no need to make any assumptions related to {Z,,t¢ Tj. The problem,
however, is that the additional generality gained by going directly to
D(y,/X,; 0) is more apparent than real. Despite the fact that the assumption
that {Z,,t¢ 1} is a NHD process is only sufficient (not necessary) for [6]
to [8] above, it considerably enhances our understanding of econometric
modelling in the context of the linear regression model. This is, firstly,
[8]
because it is commonly easier in practice to judge the appropriateness of
374
Specification, estimation and testing
probabilistic
assumptions
related
to Z, rather
than
(),/X,=x,);
and,
secondly, in the context of misspecification analysis possible sources for the
departures from the underlying assumptions are of paramount importance.
Such sources can commonly be traced to departures from the assumptions
postulated for {Z,,te T} (see Chapters 21-22).
Before we discuss the above assumptions underlying the linear regression
it is of some interest to compare the above specification with the standard
textbook approach where the probabilistic assumptions are made in terms
of the error term.
Standard textbook specification of the linear regression model
y=Xÿ+u.
(1)
(u/X) ~ N(O, ø?1,);
(2)
(3)
no a priori information on (8, ø?);
rank (X)=k.
Assumption (1) implies the orthogonality E(X{u,/X,=x,)=0,t=1,2,...,
T, and assumptions [6] to [8] the probability and the sampling models
respectively. This is because (y/X) is a linear function of uand thus normally
distributed (see Chapter 15), ie.
(y/X) ~~ N(XB, o°I,).
(19.13)
As we can see, the sampling model assumption of independence is ‘hidden’
behind the form of the conditional covariance o7/. Because of this the
independence assumption and its implications are not clearly recognised in
certain cases when the linear regression model is used in econometric
modelling. As argued in Chapter 17, the sampling model of an independent
sample is usually inappropriate when the observed data come in the form of
aggregate economic time series. Assumptions (2) and (3) are identical to [4]
and [5] above. The assumptions related to the parameters of interest
Ø=(, ø?) and the weak exogeneity of X, with respect to 6 ([2] and [3]
above) are not made in the context of the standard textbook specification.
These assumptions related to the parametrisation of the statistical GM play
a very important role in the context of the methodology proposed in
Chapter | (see also Chapter 26). Several concepts such as weak exogeneity
(see Section 19.3, below) and collinearity (see Sections 20.5-6) are only
definable with respect to a given parametrisation. Moreover, the statistical
GM is turned into an econometric model by reparametrisation, going
from the statistical to the theoretical parameters of interest.
The most important difference between the specification [1]-[8] and
(1)H{3), however, is the role attributed to the error term. In the context of the
19.3.
Discussion of the assumptions
375
latter the probabilistic and sampling model assumptions are made in terms
of the error term not in terms of the observable random variables involved
as in [1]-[8]. This difference has important implications in the context of
misspecification testing (testing the underlying assumptions) and action
thereof. The error term in the context of a statistical model as specified in
the present book is by construction white-noise relative to a given
information set ACF.
19.3
Discussion of the assumptions
[1]
The systematic and non-systematic components
As argued in Chapter 17 (see also Chapter 26) the specification of a
statistical model is based on the joint distribution of Z,,t=1,2,..., Tie.
D(Z,,Z3,...,
275,
)=
Dữ»:
ý)
(19.14)
which includes the relevant sample and measurement information.
The specification of the linear regression model can be viewed as directly
related to (14) and
derived
by ‘reduction’
using
the assumptions
of
normality and IID. The independence assumption enables us to reduce
D(Z; W) into the product of the marginal distributions D(Z,; w,), t= 1,2,...,
T, Le.
(19.15)
D(Z; p) = H DứZ: Ú,)
The identical distribution enables us to deduce that ,= for t= 1,2,..., 1.
The next step in the reduction is the following decomposition of D(Z,; p):
D(Z,; ÿ)= D(y,/X,; ÿ¡): DĨ
Ú:).
(19.16)
The normality assumption with 5 >0 and unrestricted enable us to deduce
the weak exogeneity of X, relative to @.
The choice of the relevant information set Y,={X,=x,} depends
crucially on the NID assumptions; if these assumptions are invalid the
choice of Y, will in general be inappropriate. Given this choice of Y, the
systematic and non-systematic components are defined by:
E(y,/X,=%,),
u,=y,— E(y/X,=X,).
(19.17)
Under the NITD assumptions y, and u, take the particular forms:
HÈ=fX,
u*=y,—X,.
(19.18)
376
Specification, estimation and testing
Again, if the NIID
u“u*
assumptions
and
are invalid then
E(u*u*/X,=x,)#0
(19.19)
(see Chapters 21-22),
[2]
The parameters of interest
As discussed in Chapter 17, the parameters in terms of which the statistical
GM is defined constitute by definition the statistical parameters of interest
and they represent a particular parametrisation of the unknown
parameters of the underlying probability model. In the case of the linear
regression model the parameters of interest come in the form of 0=(B, a’)
where
=Š;ÿø;,
ø?=0ii—Ø¡;Ÿ;jø¿,
AS
argued
above
the
parametrisation 6 depends not only on D(Z; W) but also on the assumptions
of NIID. Any changes in Z, or/and the NID assumptions will in general
change the parametrisation.
[3]
Exogeneity
In the linear regression model we begin with D(y,, X,; w) and then we
concentrate exclusively on D(y,/X,;,) where
D(y,,X,/) = D(y,/X5 Wy) D(X; WH),
(19.20)
which implies that we choose to ignore the marginal distribution D(X,;w,). In
order to be able to do that, this distribution must contain no information
relevant for the estimation of the parameters of interest, 0=(B,c7), i.e. the
stochastic structure of X, must be irrelevant for any inference on @.
Formalising this intuitive idea we say that: X, 1s weakly exogenous over the
sample period for @ if there exists a reparametrisation with ý=(Ú¡.¿)
such that:
(i)
(ii)
0 is a function of w, (@=h(y,));
yw, and y, are variation free ((w,, ¥,)eP, x W,).
Variation free means that for any specific value ý; in ‘P,, w, cau take any
other value in ‘¥, and vice versa. For more details on exogeneity see Engle,
Hendry and Richard (1983). When the above conditions are not satisfied
the marginal distribution of X, cannot be ignored because it contains
relevant information for any inference on Ø.
19.3.
[4]
Discussion of the assumptions
377
No a priori information on 0 =(B. 0°)
This assumption is made at the outset in order to avoid imposing invalid
testable restrictions on 6. At this stage the only relevant interpretation of 0
is as statistical parameters, directly related to W, in D(y,/X,; #1). As such no
a priori information seems likely to be available for 8. Such information is
commonly related to the theoretical parameters of interest §. Before 6 is
used to define &, however, we need to ensure that the underlying statistical
model is well defined (no misspecification) in terms of the observed data
chosen.
[5]
The observed data matrix X is of full rank
For the data matrix X =(x,, X),.... x7), Tx k, we need to assume that
rank(X)=k,
k
The need for this assumption is not at all obvious at this stage except
perhaps as a sample equivalent to the assumption
rank(;;)=k,
needed to enable us to define the parameters of interest Ø. This is because
rank(X) = rank(X’X), and
1
BF LNM=
1
XX)
can be seen as the sample moment
[6]
equivalent to Ly).
Normality, linearity, homoskedasticity
The assumption of normality of D(y,, X,; ý) plays an important role in the
specification as well as statistical analysis of the linear regression model. As
far as specification is concerned, normality of D(y,,X,; ý) implies
(i)
D(4,/X,; 9) is normal (see Chapter 15):
(1)
E(y,/X,= x,)= xu. a linear function of the observed value x, of X,;
(iil)
Var(y,/X,=x,)=ø?, the conditional variance is free of x,, Le.
homoskedastic.
Moreover, (i)-(iii) come very close to implying that D(¥,, X,; ý) is normal as
well (see Chapter 24.2).
378
Specification, estimation and testing
[7]
Parameter time-invariance
As far as
that it stems
D(y,,X,; w);
normal IID
[8]
the parameter invariance assumption is concerned we can see
from the time invariance of the parameters of the distribution
that is, from the identically distributed (ID) component of the
assumption related to Z,.
Independent sample
The assumption that y is an independent sample from D(y,/X,; @), t= 1, 2,
, T, is one of the most crucial assumptions underlying the linear
regression model. In econometrics this assumption should be looked at
very closely because most economic time series have a distinct time
dimension (dependency) which cannot be modelled exclusively in terms of
exogenous random variables X,. In such cases the non-random sample
assumption (see Chapter 23) might be more appropriate.
19.4
Estimation
(1)
Maximum likelihood estimators
Let us consider the estimation of the linear regression model as specified by
the assumptions [1]-[8] discussed above. Using the assumptions [6] to
[8] we can deduce that the likelihood function for the model takes the form
T
L(p, a7; y, X) =K(y) H
—HglBn)
1
wan
1
exp
,
"03
392 (y, ~#x/!lÌ
exp] —5
1
2
> veal
(19.21)
1
logø2~ 3%iY
log L= c— 5 log2z—
Clog L
A
log
wa
3)
= š
L
_T
=
VX, — B š
#x,?,922)
_
ae
:
Đi
3 0,—xj)x;=0,
(19.23)
t=1
A+
7
nn
op
xX;,=Ũ, lệ. =
~ Bx,)’=0,
T
-1 7
(5
=1
sx
» Xin
t=1
(19.24)
(19.25)
19.4
Estimation
379
T
ˆ
(4)
1
T
-
in an obvious notation,
— B’x,) “=e. 3 ủệ,
rà0
-
-
(19.26)
are the maximum likelihood estimators (MLE’s) of 8 and øˆ, respectively. If
we were to write the statistical GM, y,=f’x,t+u, t=1, 2,..., T, in the
matrix notation form
y= XB+u,
(19.27)
where y=(),,.-., Yr), TX 1, X=B(X,,...,X7), TX ke andus(u,,..., uz,
T x 1, the MLE’s
bm
take the more
and for @=y—Xf,
.,
suggestive
form
#=(XX) 'X
B "
yuxy
(19.28)
Gr = T ad
The information matrix I-(@) is defined by
reap || m . ]| FT máy }
ClogL\ /é log LY’
ê?logL
where the last equality holds under the assumption that
represents the ‘true’ probability model. In the above case
ê?logL
12
Op Cp - g2 Xe =~
êlogL
Got
T
294
1
Gi (XX),
6? log L
KT
Đ(y,/X,; 6)
Lự
=T~>
”
1Q,
(19.29)
98 & Hệ.
Hence
1,(0)=
7(8)
XX
“>
0
0
T
and
[IL(0] !=
LLr()]
20%
o(X'X)!
0
0
“+
(19.30)
It is very important to remember that the expectation operator above is
defined relative to the probability model D(y,/X,; 6).
In order to get some idea as to what the above matrix notation formulae
380
Specification, estimation and testing
look like let us consider these formulae for the simple model:
W= By +Box,+u,,
t=1,2,...,T.
Vy
y=
[
32
X=
Jr
(19.31)
Xy
x
uy
2
us
Xr
u
2
p= (5),
B,
Uy
(XX) !=~
n
HỊ —
j?=
3
Ộ
(y,—Đ?—
lỳn
ys=an=9j
3.(x,—x)?
Compare these formulae with those of Chapter 18.
One very important feature of the MLE f above is that it preserves the
original orthogonality between the systematic and non-systematic
components
y=ut+u,
plu
(19.32)
between the estimated systematic and non-systematic components in the
form
y=f+a, ala,
(19.33)
19.4
Estimation
381
p=XB, i=y— XB, respectively. This is because
ñ=P,y
and
ñ=(I—P,)y,
(19.34)
where P,=X(XX)
!X' is a symmetric (P, =P,), idempotent
matrix (i.e. it represents an orthogonal projection) and
(P‡=P,)
E( ja’) = E(Pyy'(I — P,))
=E(P,yu(I—P,)),
since (I-P,)y=(I—P,Ju
=P,(I—P,)o’,
since E(yu’)=o7I,
=0,
since P,(I—P,)=0.
In other words, the systematic and non-systematic components were
estimated in such a way so as to preserve the original orthogonality.
Geometrically P, and (I—P,,) represent orthogonal projectors onto the
subspace spanned by the columns of X, say .#/(X), and into its orthogonal
complement .#(X)~, respectively. The systematic component was estimated
by projecting y onto .@(X) and the non-systematic component by
projecting y into .#(X)", ie.
y=P,y+(I-P,y.
(19.35)
Moreover, this orthogonality, which is equivalent to independence in this
context, is passed over to the MLE’s f and 6? since jris independent of Wa =
y(I—P,)y, the residual sums of squares, because P,(I—P,)=0 (see Q6,
Chapter 15). Given that =X and 6? =(1/T)i'a we can deduce that B and
6” are independent; see (E2) of Section 7.1.
Another feature of the MLE’s f and 6? worth noting is the suggestive
similarity between these estimators and the parameters f, o*:
Tỉ
2
B=E;;Øj¡`
(XX\
(>)
!1/Xy
(=).
(BS
2_
~1
O° = 011 812297
,
(19.36)
mm
Gp),
,
+
=1
,
Looking at these formulae we can see that the MLE's of 8 and co” can be
derived by substituting the sample moment equivalents to the population
moments:
1
Tạ; + (X X),
đai:
1,
XY,
A
1
YY
(19.38)
Using the orthogonality of the estimated components â and a we could
382
Specification, estimation and testing
decompose the variation in y as measured by y’y into
Yy=fâ+ữâ= #XXj+ññ.
(19.39)
Using this decomposition we could define the sample equivalent to the
multiple correlation coefficient (see Chapter 15) to be
Re p yX(XX) 'Xy- Ị
yy
yy
aa
(19.40)
yy
This represents the ratio of the variation ‘explained’ by g over the total
variation and can be used as a measure of goodness of fit for the linear
regression model. A similar measure of fit can be constructed using the
decomposition of y around its mean ÿ, that 1s
(yy—T7ÿ?)=(#â— Ty?)+ữñ,
(1941)
denoted as
TSS=
{total}
ESS
(explained)
+ RSS,
(19.42)
(residual)
where SS stands for sums of squares. The multiple correlation coefficient in
this case takes the form
R=
#n-T†?
aE
(yy-TƒẾ)
RSS
1. . `.
.
TSS
343
Note that R* was used in Chapter 15 to denote the population multiple
correlation coefficient but in the econometrics literature R? is also used to
denote R? and R?.
Both of the above measures of ‘goodness of fit’”, R? and R?, have variously
been defined to be the sample multiple correlation coefficient in the
econometric literature. Caution should be exercised when reading different
textbooks because R? and R? have different properties. For example,
0
in X, is the constant term. On the role of the constant term see Appendix
19.1.
One serious objection to the use of R? as a goodness-of-fit measure is the
fact that as the number k of regressors increases, R* increases as well
irrespective of whether the regressors are relevant or not. For this reason a
‘corrected’ goodness-of-fit measure is defined by
(
2—_
)
aT
k
yy—Ty?
T—I
—
-
|
2
19.4
Estimation
383
The correction is the division of the statistics
corresponding degrees of freedom; see Theil (1971).
(2)
involved
by
their
An empirical example
In order to illustrate some of the concepts and results introduced so far let
us consider estimating a transactions demand for money. Using the
simplest form of a demand function we can postulate the theoretical model:
M°=WHY, P, 1),
(19.45)
where M° is the transactions demand for money, Y is income, P is the price
level and I is the short-run interest rate referring to the opportunity cost of
holding transactions money. Assuming a multiplicative form for h(-) the
demand function takes the form
or
M? = AY“ P2y®
(19.46)
In MP=za+z¡lnY+a;lnP+zz In 1,
(19.47)
where In stands for log, and a )=In A.
For expositional purposes let us adopt the commonly accepted approach
to econometric modelling (see Chapter 1) in an attempt to highlight some of
the problems associated with it. If we were to ignore the discussion on
econometric modelling in Chapter 1 and proceed by using the usual
‘textbook’ approach the next step is to transform the theoretical model to
an econometric model by adding an error term, 1.e. the econometric model
is
M,= Ay +, YX,
(19.48)
+ 03h, + tụ,
where m,=In M,, y,=In Y,, p,=In P,, i,=In I, and u,~ NI(0, 0”). Choosing
some observed data series corresponding to the theoretical variables, M, Y,
P and I, say:
M,- M1 money stock;
Y, — real consumers’ expenditure;
P, ~ implicit price deflator of Ÿ,;
[, — interest rate on 7 days’ deposit account (see Chapter 17 and its
appendix for these data series),
respectively, the above equation
regression statistical GM:
can
be
Mi, = Bo + By 3, + BoB. + Bai, tu.
Estimation of this equation for the period
transformed
into
the
linear
(19.49)
1963i-1982iv (T= 80) using
384
Specification, estimation and testing
quarterly seasonally adjusted (for convenience) data yields
2.896
0.690
0.865
—0.055
s*=0.00155,
TSS
= 24.954,
R?=0.9953, R?=0.9951,
ESS=24.836,
RSS=0.118.
That is, the estimated equation takes the form
mi,
= 2.896 +. 0.690,
+. 0.8657, —0.055i,
+ i,.
(19.50)
The danger at this point is to get carried away and start discussing the
plausibility of the sign and size of the estimated ‘elasticities’ (?). For example,
we might be tempted to argue that the estimated ‘elasticities’ have both a
‘correct’ sign and the size assumed on a priori grounds. Moreover, the
‘goodness of fit’ measures show that we explain 99.5°% of the variation.
Taken together these results ‘indicate’ that (50) is a good empirical model
for the transactions demand for money. This, however, will be rather
premature in view of the fact that before any discussion ofa priori economic
theory information we need to have a well-defined estimated statistical
model which at least summarises the sample information adequately. Well
defined in the present context refers to ensuring that the assumptions
underlying the statistical model adopted are valid. This is because any
formal testing ofa priori restrictions could only be based on the underlying
assumptions which when invalid render the testing procedures incorrect.
Looking at the above estimated equation in view of the discussion of
econometric modelling in Chapter | several objections might be raised:
(i)
(1)
(iii)
The observed data chosen do not correspond one-to-one to the
theoretical variables and thus the estimable model might be
different from the theoretical model (see Chapter 23).
The sampling model of an independent sample seems questionable
in view of the time paths of the observed data (see Fig. 17.1).
The high R? (and R’) is due to the fact that the data series for M, and
P, havea very similar time trend (see Fig. 17. l(a) and (c)}). If we look
at the time path of the actual (4,) and fitted (},) values we notice
that f, ‘tracks’ (explains) largely the trend and very little else (see
Fig. 19.1). An obvious way to get some idea of the trend’s
contribution in R? is to subtract p, from both sides of the money
equation in an attempt to ‘detrend’ the dependent variable.
19.4
Estimation
385
_
10.6
actual
Z⁄ fitted
10.4 —
Y”
⁄
À
&
T
©
œ
T
oa
Y
Ye
oO
À
10.2 }-
9.2
9.0 8.8
~ “
7111111iiliirliiiiiiiliiiliiiliirltirliiiiiiliiiliiiLurirliiiliriLiirliirlii
1963
1966
1969
1972
Time
1975
1978
1982
Fig. 19.1. Actual y,=In M, and fitted }, from (19.50).
9.9 —
9.8
ot
actual
x
, fitted
97
/
9.
i
xitiliiiliriliiiliiiliiiliiiliiiliriliiiÌiiiliiriiariiirliiiliirliitiiiiliiiLLL.
1963
1966
1969
1972
Time
1975
1978
1982
Fig. 19.2. Actual y,=In(M/P), and fitted y, from (19.51).
In Fig. 19.2 the actual and fitted values of the ‘largely’ detrended dependent
variable (m,—p,) are shown to emphasise the point. The new regression
equation yielded
(m,—p,)=2.896+0.690y,—0.135p,—0.055ï,
+ ñ,.
R2=0.468, R?=0.447, s?=0.00155.
(19.51)
386
Specification, estimation and testing
Looking at this estimated equation we can see that the coefficients of the
constant, y, and i,, are identical in value to the previous estimated equation.
The estimated coefficient of p, is, as expected,
estimate and the s? is identical for both estimated
that the two estimated equations are identical
coefficients are concerned. This is a special case
one minus the original
equations. These suggest
as far as the estimated
of a more general result
related to arbitrary linear combinations of the x,,s subtracted from both
sides of the statistical GM. In order to see this let us subtract y’x, from both
sides of the statistical GM:
,—yX,=(Ệ — yx, tu,
or
ye = Px,
(19.52)
Tu,
in an obvious notation. It is easy to see that the non-systematic component
as well as o* remain unchanged. Moreover, in view of the equality
where
ñ* =ñ,
(19.53)
ñ*=y"—Xƒ*,
*=(XX) 'Xy*=f-y,
we can deduce that
v=
"
T-k
T-k
vu.
(19.54)
On the other hand, R? is not invariant to this transformation because
2x42
wd
33
As we can see, the R? of the ‘detrended’ dependent variable equation 1s less
than half of the original. This confirms the suggestion that the trend in p,
contributes significantly to the high value of the original R?. It is important
to note at this stage that trending data series can be a problem when the
asymptotic properties of the MLE’s are used uncritically (see sub-section (4)
below).
(3)
Properties of the MLE 6=(B, a7) — finite sample
In order to decide whether the MLE 6 is a ‘good’ estimator of @ we need to
consider its properties. The finite sample properties (see Chapters
13) will be considered first and then the asymptotic properties.
(1)
6 being a MLE satisfies certain properties by definition:
12 and
For a Borel function h(-) the MLE of h() is h(6). For example, the
MLE of log(f’B) is log(f’f).
19.4
(2)
Estimation
387
If a minimal sufficient statistic t(y) exists, then 6 must be a function
of it.
Using the Lehmann-—Scheffe theorem (see Chapter 12) we can deduce that
the values of y, for which the ratio
D(y/X: 0)
=
D(yo/X; 8) 7
(2n0?)-™ =p|~; 1:t~X# ly Xp)
(2mø?)~*? ep
1
—>_—z(Yo—X#(Yo—
xp!
2ø?
(19.56)
is independent of @, are yoyo>=y’y and X’y,=X’y. Hence, the minimal
sufficient statistic is t(y)=(t,(y), t(y))=(y’y. X’y) and B=(X'X)~!1,(y),
6? =(1/T)(t,(y) —1'x(y)(X’X)~ 't2(y)) are indeed functions of t(y).
In order to discuss any other properties of the MLE 6 of 6 we need to
derive the sampling distribution of 6. Given that B and 6? are independent
we can consider them separately.
The distribution of B
Ậ=(XX)- IX'y=Ly,
(19.57)
where L =(X’X)~!X’ is a kx T matrix of known constants. That is, B is a
linear function of the normally distributed random vector y. Hence
B~ N(LXB,o7LL’)
or
from N1, Chapter 15,
Br N(B, 07(X'X)~').
(19.58)
From the sampling distribution (58) we can deduce the following properties
for p:
(3(i))
fis an unbiased estimator of B since E(p)=, i.e. the sampling
(4(i))
distribution of B has mean equal to ổ.
Bisa fully efficient estimator of B since Cov(p)=07(X’X)
“+, ie.
Cov(B) achieves the Cramer—Rao lower bound; see (30) above.
The distribution of 6?
P=
1
2
~
I
1
(y— XB iy —XB=— Wâ== uMụu,
where M,=I—P,.
()
From (Q2) of Chapter
~ (tr M,),
(19.59)
15 we can deduce that
(19.60)
388
Specification, estimation and testing
where tr M, refers to the trace of M, (trA=)"_, a;;, Ai nxn),
trM,=trI-trX(XX) 'X'
=T-tr(XX) '\XX)
Gince tr(A+B)=tr A+tr B)
(since tr(AB)=tr(BA))
=T—k.
Hence, we can deduce that
T 22
(=Ø )xeư-
(19.61)
Intuitively we can explain this result as saying that (u’M,u)/o? represents
the summation of the squares of T—k independent standard normal
components.
Using (61) we can deduce that
T2? )=T-t
t5
PB
and var(
To?
53
a
}=3=k
(see Appendix 6.1). These results imply that
T—k 0? #0".
EG?)=——
2(T—k)?ø*
Var(23)=^U2(T~k)
„2-9
ø
Tả
“Co g4 a
That is:
(3(ii))
(4(ii))
- Cramer-Rao
lower bound.
6? is a biased estimator of o?: and
6? is not a fully efficient estimator of 02.
However, 3(1i1) implies that for
1
ữa
8 2_ x_x„ hũ
(19.62)
19.62
2
(T-K) 3~ (7-8)
and
E(S?)=o7,
Var(s?)=(20+)(T—k)>(20%)/T
(19.63)
— Cramer-Rao
bound.
That is, s? is an unbiased estimator of c?, although it does not quite achieve
the Cramer-Rao lower bound given by the information matrix (30) above.
It turns out, however, that no other unbiased estimator of o? achieves that
bound and among such estimators s? has minimum variance. In statistical
inference relating to the linear regression model s? is preferred to 67 as an
estimator of a7.
The sampling
distributions of the estimators
and
s* involve the