THE LINEAR REGRESSION MODEL I

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.44 MB, 43 trang )

CHAPTER

19

The linear regression model I — specification,
estimation and testing

19.1

Introduction

The linear regression model forms the backbone of most other statistical
models of particular interest in econometrics. A sound understanding of the
specification, estimation, testing and prediction in the linear regression

model holds the key to a better understanding of the other statistical models

discussed in the present book.
In relation to the Gauss linear model discussed in Chapter 18, apart from
some apparent similarity in the notation and the mathematical
manipulations involved in the statistical analysis, the linear regression

model purports to model a very different situation from the one envisaged

by the former. In particular the Gauss linear model could be considered to
be the appropriate statistical model for analysing estimable models of the
form

M,=%+a,t+
M,=ws+

3

3

¡=1

=1

Y cOint Y diQiat.

k

} dit’,

¡=1

(19.1)
(19.2)

where M, refers to money and Q,,, i= 1,2, 3 to quarterly dummy variables,

in view of the non-stochastic nature of the x,,s involved. On the other hand,
estimable models such as

M = AY*®P2y®,

(19.3)

referring to a demand for money function (M ~ money, Ÿ - income, P ~ price
level, J — interest rate), could not be analysed in the context of the Gauss

369

370

Specification, estimation and testing

linear model. This is because it is rather arbitrary to discriminate on
probabilistic grounds between the variable giving rise to the observed data
chosen for M and those for Y, P and J. For estimable models such as (3) the
linear regression model as sketched in Chapter 17 seems more appropriate,
especially if the observed data chosen do not exhibit time dependence. This
will become clearer in the present chapter after the specification of the linear
regression model in Section 19.2. The money demand function (3) is used to
illustrate the various concepts and results introduced throughout this
chapter.

19.2

Specification

Let {Z,, t¢ T} bea vector stochastic process on the probability space (S, F
P(-) where Z, =(y,,X;)’ represents the vector of random variables giving rise
to the observed data chosen, with y, being the variable whose behaviour we
are aiming to explain. The stochastic process {Z,,teT} is assumed to be
normal, independent and identically distributed (NUD) with E(Z,)=m and

Cov(Z,=Z. Le.
Vt

(x)

N

my\(O11

xi

12

(19.4)

rel

2?)

in an obvious notation (see Chapter 15). It is interesting to note at this stage
that these assumptions seem rather restrictive for most economic data in
general and time-series in particular.

On the basis of the assumption that {Z,, te T} isa NIID vector stochastic

process we can proceed to reduce the joint distribution D(Z,,..., Z;; w) in

order to define the statistical GM of the linear regression model using the
general form
where
and

V=n+u,,

ted,

H,= E(y,/X,;=x,)_

(19.5)
is the systematic component,

u,=y,—E(y,/X,=x,)

the non-systematic

component

(see Chapter 17). In view of the normality of {Z,,t¢1T! we deduce that
H, = E(y,/X,=X,)= Bo + Bx,

(Hnear in x,),

(19.6)

where

and

-1

ủo=my,ứ;,>;ym,,

_đâ-=l1

=E;72Ă

Var(u,/X,=x,)= Vat(y,X,=x,)=øơ”

(homoskedastic),

(19.7)

19.2

Specification

where ø?=ø¡i¡—øizŠ;2ø¿;

371

(see Chapter 15). The time inoariance of the

parameters Øạ, B and oa? stems from the identically distributed (ID)
assumption related to {Z,,t¢ 1}. It is important, however, to note that the
ID assumption provides only a sufficient condition for the time invariance
of the statistical parameters.
In order to simplify the notation let us assume the m=0 without any loss
of generality given that we can easily transform the original variables in
mean derivation form (y,—m,) and (X,—m,). This implies that Bp, the

coefficient of the constant, is zero and the systematic component becomes
E(y,/X, = X,) = B’X,.

(19.8)

In practice, however, unless the observed data are in mean deviation form
the constant should never be dropped because the estimates derived

ores

are not estimates of the regression coefficients B= 2

¢,, but of

B*
= E(X,X/) | E(X/y,}; see Appendix 19.1 on the role of the constant.
The stattical GM of the linear regression model takes the particular
form

y,=Bx,+u,,

teT,

(19.9)

with 0 =(B, o”) being the statistical parameters of interest; the parameters in
terms of which the statistical GM is defined. By construction the systematic
and non-systematic components of (9) satisfy the following properties:
(i)

E(u,/X, = X;)

= E[U,— E(y,/X,= X,))/X
= E(y,/X, = x,) — FX,

(ii)

==x,]
= X,) =0;

E(u,u,/X, = X,)

= E[(y, — Ely, /X,=x,))(ys — E/X= X,))/X, =X]
— pm, t=s
0,
(iti)

t#s;

E(u,u,/X, = X,) = 1, E(u,/X,=x,)=0,

tseT.

The first two properties define {u,, t€ T} to be a white-noise process and (iii)
establishes the orthogonality of the two components. It is important to note
that the above expectation operator E(-/X,=x,) is defined in terms of

D(y,/X,; 6), which is the distribution underlying the probability model for

(9). However, the above properties hold for E(-) defined in terms of D(Z,; w)

as weil, given that:

(i)’

Elu,) = Ey E(u,/X,=x,)} =0;

(ii)

E(u,u,) = EY E(uu,/X,=x,)} =

a,

t=s

0,

t#s;

372

Specification, estimation and testing

and

(„

E(uu,)= Et(Euu,X,=x,)}=0,

ĐseT

(see Section 7.2 on conditional expectation).
The conditional distribution D(y,/X,; 0) 1s related to the joint distribution
D(y,, X„; ý) via the decomposition
DỤ,,X„;

/⁄)=D(yX,

W,)

D(X,;

Wf)

(19.10)

(see Chapter 5). Given that in defining the probability model of the linear

regression model as based on D(},/X,; 8) we choose to ignore D(X,; ;) for
the estimation of the statistical parameters of interest @. For this to be
possible we need to ensure that X, is weakly exogenous with respect to 0 for
the sample period

t=1, 2,...,

T (see Section

19.3, below).

For the statistical parameters of interest 0 =(B, a7) to be well defined we

need to ensure that £,, is non-singular, in view of the formulae B= Z3;'02,,
67 =6,;

— 6,273 6>,, at least for the sample period t= 1, 2,..., T: This

requires that the sample equivalent of £,,,(1/T)(X’X) where X =(x,,X),

pe

x)’ is indeed non-singular, i.e.

rank(X’X) = rank(X) =k,
X, being a k x | vector.
As argued in Chapter

(19.11)

17, the statistical parameters of interest do not

necessarily coincide with the theoretical parameters of interest €. We need,

however, to ensure that € is uniquely defined in terms of 6 for € to be
identifiable. In constructing empirical econometric models we proceed from
a well-defined estimated statistical GM (see Chapter 22) to reparametrise it
in terms of the theoretical parameters of interest. Any restrictions induced
by the reparametrisation, however, should be tested for their validity. For
this reason no a priori restrictions are imposed on @ at the outset to make
such restrictions testable at a later stage.
As argued above, the probability model underlying (9) is defined in terms

of D(y,/X,; 8) and takes the form

=| Dux:

9-5

1

am

exp

5g

1

py"
0cñ*xR.,

ret},

(19.12)

Moreover, in view of the independence of {Z,, te 7} the sampling model
takes the form of an independent sample, y=(y,,...,7)’, sequentially
drawn from D(y,/X,; 8), t=1,2,..., T, respectively.

Having defined all three components of the linear regression model let us

19.2

Specification

collect all the assumptions
properly.

373

together

specify

and

statistical

the

model

The linear regression model: specification

(1)

Statistical GM, y,= Bx, +u,, te T

[1]

tt, = E(y,/X,=x,) — the systematic component; u, = y, ~ Ely,/X,= X,)

— the non-systematic component.

[2]

are the statistical

0=(B, 07). B=Xz76n;, 07 =0,,—G 174374,

[3]
[4]
[51

parameters of interest. (Note: Z),=Cov(X,), 62; =Cov(X,, y,),
Øi¡= Vat(y,).)
X, is weakly exogenous with respect to #,r=1,2,...,T.
No a priori information on 8.
Rank(X)=k, X=(x,, X2,..., X7); Tx k data matrix, (T'>k).

(II)

Probability model
6=

_

Tw a.

|PUu/XzØ)=

1

21.)

9P|

1

20)

.=

ÿx,)

|

0=(ÿ.ø”)cửxIR,, ret}
[6]

[7]

()
(11)
(ili)
@ is time

D(y,/X„; 9) is normal;
E(y,/X,=X,)
= B’x, — linear in x,;
=o? — homoskedastic (free of x,);
Var(y,/X,=X,)

(IIT)

Sampling model

invariant.

y=(\;,..., yy) represents an independent sample sequentially
drawn from D(y,/X,; 9), t= 1, 2,....TAn important point to note about the above specification is that the
model is specified directly in terms of D(y,/X,; 6) making no assumptions
about D(Z,; w). For the specification of the linear regression model there is
no need to make any assumptions related to {Z,,t¢ Tj. The problem,
however, is that the additional generality gained by going directly to
D(y,/X,; 0) is more apparent than real. Despite the fact that the assumption
that {Z,,t¢ 1} is a NHD process is only sufficient (not necessary) for [6]
to [8] above, it considerably enhances our understanding of econometric
modelling in the context of the linear regression model. This is, firstly,
[8]

because it is commonly easier in practice to judge the appropriateness of

374

Specification, estimation and testing

probabilistic

assumptions

related

to Z, rather

than

(),/X,=x,);

and,

secondly, in the context of misspecification analysis possible sources for the
departures from the underlying assumptions are of paramount importance.
Such sources can commonly be traced to departures from the assumptions

postulated for {Z,,te T} (see Chapters 21-22).
Before we discuss the above assumptions underlying the linear regression
it is of some interest to compare the above specification with the standard

textbook approach where the probabilistic assumptions are made in terms
of the error term.
Standard textbook specification of the linear regression model
y=Xÿ+u.
(1)

(u/X) ~ N(O, ø?1,);

(2)
(3)

no a priori information on (8, ø?);

rank (X)=k.
Assumption (1) implies the orthogonality E(X{u,/X,=x,)=0,t=1,2,...,
T, and assumptions [6] to [8] the probability and the sampling models
respectively. This is because (y/X) is a linear function of uand thus normally
distributed (see Chapter 15), ie.

(y/X) ~~ N(XB, o°I,).

(19.13)

As we can see, the sampling model assumption of independence is ‘hidden’
behind the form of the conditional covariance o7/. Because of this the
independence assumption and its implications are not clearly recognised in
certain cases when the linear regression model is used in econometric
modelling. As argued in Chapter 17, the sampling model of an independent
sample is usually inappropriate when the observed data come in the form of
aggregate economic time series. Assumptions (2) and (3) are identical to [4]
and [5] above. The assumptions related to the parameters of interest
Ø=(, ø?) and the weak exogeneity of X, with respect to 6 ([2] and [3]
above) are not made in the context of the standard textbook specification.
These assumptions related to the parametrisation of the statistical GM play

a very important role in the context of the methodology proposed in

Chapter | (see also Chapter 26). Several concepts such as weak exogeneity
(see Section 19.3, below) and collinearity (see Sections 20.5-6) are only
definable with respect to a given parametrisation. Moreover, the statistical
GM is turned into an econometric model by reparametrisation, going

from the statistical to the theoretical parameters of interest.

The most important difference between the specification [1]-[8] and
(1)H{3), however, is the role attributed to the error term. In the context of the

19.3.

Discussion of the assumptions

375

latter the probabilistic and sampling model assumptions are made in terms
of the error term not in terms of the observable random variables involved
as in [1]-[8]. This difference has important implications in the context of
misspecification testing (testing the underlying assumptions) and action
thereof. The error term in the context of a statistical model as specified in
the present book is by construction white-noise relative to a given
information set ACF.

19.3

Discussion of the assumptions

[1]

The systematic and non-systematic components

As argued in Chapter 17 (see also Chapter 26) the specification of a
statistical model is based on the joint distribution of Z,,t=1,2,..., Tie.
D(Z,,Z3,...,

275,

)=

Dữ»:

ý)

(19.14)

which includes the relevant sample and measurement information.
The specification of the linear regression model can be viewed as directly

related to (14) and

derived

by ‘reduction’

using

the assumptions

of

normality and IID. The independence assumption enables us to reduce
D(Z; W) into the product of the marginal distributions D(Z,; w,), t= 1,2,...,
T, Le.

(19.15)

D(Z; p) = H DứZ: Ú,)

The identical distribution enables us to deduce that ,= for t= 1,2,..., 1.
The next step in the reduction is the following decomposition of D(Z,; p):

D(Z,; ÿ)= D(y,/X,; ÿ¡): DĨ

Ú:).

(19.16)

The normality assumption with 5 >0 and unrestricted enable us to deduce

the weak exogeneity of X, relative to @.
The choice of the relevant information set Y,={X,=x,} depends
crucially on the NID assumptions; if these assumptions are invalid the

choice of Y, will in general be inappropriate. Given this choice of Y, the
systematic and non-systematic components are defined by:
E(y,/X,=%,),

u,=y,— E(y/X,=X,).

(19.17)

Under the NITD assumptions y, and u, take the particular forms:

HÈ=fX,

u*=y,—X,.

(19.18)

376

Specification, estimation and testing

Again, if the NIID

u“u*

assumptions

and

are invalid then

E(u*u*/X,=x,)#0

(19.19)

(see Chapters 21-22),

[2]

The parameters of interest

As discussed in Chapter 17, the parameters in terms of which the statistical
GM is defined constitute by definition the statistical parameters of interest
and they represent a particular parametrisation of the unknown
parameters of the underlying probability model. In the case of the linear
regression model the parameters of interest come in the form of 0=(B, a’)

where

=Š;ÿø;,

ø?=0ii—Ø¡;Ÿ;jø¿,

AS

argued

above

the

parametrisation 6 depends not only on D(Z; W) but also on the assumptions

of NIID. Any changes in Z, or/and the NID assumptions will in general
change the parametrisation.

[3]

Exogeneity

In the linear regression model we begin with D(y,, X,; w) and then we

concentrate exclusively on D(y,/X,;,) where

D(y,,X,/) = D(y,/X5 Wy) D(X; WH),

(19.20)

which implies that we choose to ignore the marginal distribution D(X,;w,). In
order to be able to do that, this distribution must contain no information
relevant for the estimation of the parameters of interest, 0=(B,c7), i.e. the

stochastic structure of X, must be irrelevant for any inference on @.

Formalising this intuitive idea we say that: X, 1s weakly exogenous over the
sample period for @ if there exists a reparametrisation with ý=(Ú¡.¿)
such that:

(i)

(ii)

0 is a function of w, (@=h(y,));

yw, and y, are variation free ((w,, ¥,)eP, x W,).

Variation free means that for any specific value ý; in ‘P,, w, cau take any

other value in ‘¥, and vice versa. For more details on exogeneity see Engle,
Hendry and Richard (1983). When the above conditions are not satisfied
the marginal distribution of X, cannot be ignored because it contains
relevant information for any inference on Ø.

19.3.

[4]

Discussion of the assumptions

377

No a priori information on 0 =(B. 0°)

This assumption is made at the outset in order to avoid imposing invalid
testable restrictions on 6. At this stage the only relevant interpretation of 0
is as statistical parameters, directly related to W, in D(y,/X,; #1). As such no
a priori information seems likely to be available for 8. Such information is
commonly related to the theoretical parameters of interest §. Before 6 is
used to define &, however, we need to ensure that the underlying statistical
model is well defined (no misspecification) in terms of the observed data
chosen.

[5]

The observed data matrix X is of full rank

For the data matrix X =(x,, X),.... x7), Tx k, we need to assume that
rank(X)=k,

k

The need for this assumption is not at all obvious at this stage except
perhaps as a sample equivalent to the assumption
rank(;;)=k,
needed to enable us to define the parameters of interest Ø. This is because
rank(X) = rank(X’X), and
1

BF LNM=

1

XX)

can be seen as the sample moment

[6]

equivalent to Ly).

Normality, linearity, homoskedasticity

The assumption of normality of D(y,, X,; ý) plays an important role in the
specification as well as statistical analysis of the linear regression model. As
far as specification is concerned, normality of D(y,,X,; ý) implies
(i)
D(4,/X,; 9) is normal (see Chapter 15):
(1)
E(y,/X,= x,)= xu. a linear function of the observed value x, of X,;
(iil)
Var(y,/X,=x,)=ø?, the conditional variance is free of x,, Le.

homoskedastic.
Moreover, (i)-(iii) come very close to implying that D(¥,, X,; ý) is normal as
well (see Chapter 24.2).

378

Specification, estimation and testing

[7]

Parameter time-invariance

As far as
that it stems
D(y,,X,; w);
normal IID
[8]

the parameter invariance assumption is concerned we can see
from the time invariance of the parameters of the distribution
that is, from the identically distributed (ID) component of the
assumption related to Z,.

Independent sample

The assumption that y is an independent sample from D(y,/X,; @), t= 1, 2,
, T, is one of the most crucial assumptions underlying the linear
regression model. In econometrics this assumption should be looked at
very closely because most economic time series have a distinct time

dimension (dependency) which cannot be modelled exclusively in terms of
exogenous random variables X,. In such cases the non-random sample
assumption (see Chapter 23) might be more appropriate.
19.4

Estimation

(1)

Maximum likelihood estimators

Let us consider the estimation of the linear regression model as specified by
the assumptions [1]-[8] discussed above. Using the assumptions [6] to
[8] we can deduce that the likelihood function for the model takes the form
T

L(p, a7; y, X) =K(y) H

—HglBn)

1

wan

1

exp

,

"03

392 (y, ~#x/!lÌ

exp] —5

1

2

> veal

(19.21)
1
logø2~ 3%iY

log L= c— 5 log2z—

Clog L

A

log
wa
3)

= š

L

_T
=

VX, — B š

#x,?,922)

_

ae

:

Đi

3 0,—xj)x;=0,

(19.23)

t=1

A+

7
nn
op

xX;,=Ũ, lệ. =

~ Bx,)’=0,

T
-1 7
(5

=1

sx

» Xin

t=1

(19.24)
(19.25)

19.4

Estimation

379

T

ˆ

(4)

1

T

-

in an obvious notation,

— B’x,) “=e. 3 ủệ,

rà0

-

-

(19.26)

are the maximum likelihood estimators (MLE’s) of 8 and øˆ, respectively. If

we were to write the statistical GM, y,=f’x,t+u, t=1, 2,..., T, in the
matrix notation form
y= XB+u,

(19.27)

where y=(),,.-., Yr), TX 1, X=B(X,,...,X7), TX ke andus(u,,..., uz,
T x 1, the MLE’s

bm

take the more

and for @=y—Xf,

.,

suggestive

form

#=(XX) 'X

B "

yuxy

(19.28)

Gr = T ad
The information matrix I-(@) is defined by

reap || m . ]| FT máy }
ClogL\ /é log LY’

ê?logL

where the last equality holds under the assumption that
represents the ‘true’ probability model. In the above case

ê?logL

12

Op Cp - g2 Xe =~

êlogL

Got

T

294

1

Gi (XX),

6? log L

KT

Đ(y,/X,; 6)

Lự

=T~>

”

1Q,

(19.29)

98 & Hệ.

Hence

1,(0)=
7(8)

XX

“>

0

0

T

and

[IL(0] !=
LLr()]

20%

o(X'X)!

0

0

“+
(19.30)

It is very important to remember that the expectation operator above is

defined relative to the probability model D(y,/X,; 6).

In order to get some idea as to what the above matrix notation formulae

380

Specification, estimation and testing

look like let us consider these formulae for the simple model:

W= By +Box,+u,,

t=1,2,...,T.

Vy

y=

[

32

X=

Jr

(19.31)

Xy

x

uy

2

us

Xr

u

2

p= (5),

B,

Uy

(XX) !=~

n

HỊ —

j?=

3

Ộ

(y,—Đ?—

lỳn
ys=an=9j

3.(x,—x)?

Compare these formulae with those of Chapter 18.
One very important feature of the MLE f above is that it preserves the
original orthogonality between the systematic and non-systematic
components

y=ut+u,

plu

(19.32)

between the estimated systematic and non-systematic components in the
form

y=f+a, ala,

(19.33)

19.4

Estimation

381

p=XB, i=y— XB, respectively. This is because
ñ=P,y

and

ñ=(I—P,)y,

(19.34)

where P,=X(XX)
!X' is a symmetric (P, =P,), idempotent
matrix (i.e. it represents an orthogonal projection) and

(P‡=P,)

E( ja’) = E(Pyy'(I — P,))
=E(P,yu(I—P,)),

since (I-P,)y=(I—P,Ju

=P,(I—P,)o’,

since E(yu’)=o7I,

=0,

since P,(I—P,)=0.

In other words, the systematic and non-systematic components were
estimated in such a way so as to preserve the original orthogonality.

Geometrically P, and (I—P,,) represent orthogonal projectors onto the

subspace spanned by the columns of X, say .#/(X), and into its orthogonal
complement .#(X)~, respectively. The systematic component was estimated
by projecting y onto .@(X) and the non-systematic component by

projecting y into .#(X)", ie.
y=P,y+(I-P,y.

(19.35)

Moreover, this orthogonality, which is equivalent to independence in this
context, is passed over to the MLE’s f and 6? since jris independent of Wa =
y(I—P,)y, the residual sums of squares, because P,(I—P,)=0 (see Q6,
Chapter 15). Given that =X and 6? =(1/T)i'a we can deduce that B and

6” are independent; see (E2) of Section 7.1.

Another feature of the MLE’s f and 6? worth noting is the suggestive

similarity between these estimators and the parameters f, o*:
Tỉ

2

B=E;;Øj¡`

(XX\

(>)

!1/Xy
(=).

(BS
2_

~1

O° = 011 812297
,

(19.36)

mm

Gp),

,

+

=1

,

Looking at these formulae we can see that the MLE's of 8 and co” can be

derived by substituting the sample moment equivalents to the population

moments:

1
Tạ; + (X X),

đai:

1,
XY,

A

1
YY

(19.38)

Using the orthogonality of the estimated components â and a we could

382

Specification, estimation and testing

decompose the variation in y as measured by y’y into

Yy=fâ+ữâ= #XXj+ññ.

(19.39)

Using this decomposition we could define the sample equivalent to the
multiple correlation coefficient (see Chapter 15) to be

Re p yX(XX) 'Xy- Ị

yy

yy

aa

(19.40)

yy

This represents the ratio of the variation ‘explained’ by g over the total
variation and can be used as a measure of goodness of fit for the linear

regression model. A similar measure of fit can be constructed using the
decomposition of y around its mean ÿ, that 1s

(yy—T7ÿ?)=(#â— Ty?)+ữñ,

(1941)

denoted as
TSS=
{total}

ESS

(explained)

+ RSS,

(19.42)

(residual)

where SS stands for sums of squares. The multiple correlation coefficient in

this case takes the form
R=

#n-T†?

aE

(yy-TƒẾ)

RSS

1. . `.

.

TSS

343

Note that R* was used in Chapter 15 to denote the population multiple
correlation coefficient but in the econometrics literature R? is also used to

denote R? and R?.
Both of the above measures of ‘goodness of fit’”, R? and R?, have variously

been defined to be the sample multiple correlation coefficient in the
econometric literature. Caution should be exercised when reading different
textbooks because R? and R? have different properties. For example,
0in X, is the constant term. On the role of the constant term see Appendix

19.1.
One serious objection to the use of R? as a goodness-of-fit measure is the
fact that as the number k of regressors increases, R* increases as well
irrespective of whether the regressors are relevant or not. For this reason a
‘corrected’ goodness-of-fit measure is defined by
(

2—_

)

aT
k

yy—Ty?

T—I

—

-

|

2

19.4

Estimation

383

The correction is the division of the statistics
corresponding degrees of freedom; see Theil (1971).

(2)

involved

by

their

An empirical example

In order to illustrate some of the concepts and results introduced so far let
us consider estimating a transactions demand for money. Using the
simplest form of a demand function we can postulate the theoretical model:

M°=WHY, P, 1),

(19.45)

where M° is the transactions demand for money, Y is income, P is the price
level and I is the short-run interest rate referring to the opportunity cost of
holding transactions money. Assuming a multiplicative form for h(-) the

demand function takes the form

or

M? = AY“ P2y®

(19.46)

In MP=za+z¡lnY+a;lnP+zz In 1,

(19.47)

where In stands for log, and a )=In A.

For expositional purposes let us adopt the commonly accepted approach
to econometric modelling (see Chapter 1) in an attempt to highlight some of
the problems associated with it. If we were to ignore the discussion on
econometric modelling in Chapter 1 and proceed by using the usual
‘textbook’ approach the next step is to transform the theoretical model to
an econometric model by adding an error term, 1.e. the econometric model

is

M,= Ay +, YX,

(19.48)

+ 03h, + tụ,

where m,=In M,, y,=In Y,, p,=In P,, i,=In I, and u,~ NI(0, 0”). Choosing

some observed data series corresponding to the theoretical variables, M, Y,

P and I, say:

M,- M1 money stock;

Y, — real consumers’ expenditure;

P, ~ implicit price deflator of Ÿ,;

[, — interest rate on 7 days’ deposit account (see Chapter 17 and its

appendix for these data series),

respectively, the above equation
regression statistical GM:

can

be

Mi, = Bo + By 3, + BoB. + Bai, tu.

Estimation of this equation for the period

transformed

into

the

linear
(19.49)

1963i-1982iv (T= 80) using

384

Specification, estimation and testing

quarterly seasonally adjusted (for convenience) data yields

2.896
0.690
0.865
—0.055
s*=0.00155,
TSS
= 24.954,

R?=0.9953, R?=0.9951,
ESS=24.836,

RSS=0.118.

That is, the estimated equation takes the form

mi,
= 2.896 +. 0.690,
+. 0.8657, —0.055i,
+ i,.

(19.50)

The danger at this point is to get carried away and start discussing the
plausibility of the sign and size of the estimated ‘elasticities’ (?). For example,
we might be tempted to argue that the estimated ‘elasticities’ have both a

‘correct’ sign and the size assumed on a priori grounds. Moreover, the
‘goodness of fit’ measures show that we explain 99.5°% of the variation.
Taken together these results ‘indicate’ that (50) is a good empirical model
for the transactions demand for money. This, however, will be rather
premature in view of the fact that before any discussion ofa priori economic

theory information we need to have a well-defined estimated statistical

model which at least summarises the sample information adequately. Well
defined in the present context refers to ensuring that the assumptions
underlying the statistical model adopted are valid. This is because any
formal testing ofa priori restrictions could only be based on the underlying
assumptions which when invalid render the testing procedures incorrect.

Looking at the above estimated equation in view of the discussion of
econometric modelling in Chapter | several objections might be raised:

(i)

(1)

(iii)

The observed data chosen do not correspond one-to-one to the
theoretical variables and thus the estimable model might be
different from the theoretical model (see Chapter 23).
The sampling model of an independent sample seems questionable
in view of the time paths of the observed data (see Fig. 17.1).

The high R? (and R’) is due to the fact that the data series for M, and

P, havea very similar time trend (see Fig. 17. l(a) and (c)}). If we look
at the time path of the actual (4,) and fitted (},) values we notice
that f, ‘tracks’ (explains) largely the trend and very little else (see
Fig. 19.1). An obvious way to get some idea of the trend’s
contribution in R? is to subtract p, from both sides of the money

equation in an attempt to ‘detrend’ the dependent variable.

19.4

Estimation

385

_

10.6

actual

Z⁄ fitted

10.4 —

Y”
⁄

À

&

T

©
œ
T
oa

Y
Ye

oO

À

10.2 }-

9.2

9.0 8.8

~ “

7111111iiliirliiiiiiiliiiliiiliirltirliiiiiiliiiliiiLurirliiiliriLiirliirlii

1963

1966

1969

1972
Time

1975

1978

1982

Fig. 19.1. Actual y,=In M, and fitted }, from (19.50).
9.9 —

9.8
ot

actual

x

, fitted
97
/

9.

i

xitiliiiliriliiiliiiliiiliiiliiiliriliiiÌiiiliiriiariiirliiiliirliitiiiiliiiLLL.

1963

1966

1969

1972
Time

1975

1978

1982

Fig. 19.2. Actual y,=In(M/P), and fitted y, from (19.51).

In Fig. 19.2 the actual and fitted values of the ‘largely’ detrended dependent
variable (m,—p,) are shown to emphasise the point. The new regression
equation yielded

(m,—p,)=2.896+0.690y,—0.135p,—0.055ï,
+ ñ,.
R2=0.468, R?=0.447, s?=0.00155.

(19.51)

386

Specification, estimation and testing

Looking at this estimated equation we can see that the coefficients of the

constant, y, and i,, are identical in value to the previous estimated equation.

The estimated coefficient of p, is, as expected,
estimate and the s? is identical for both estimated
that the two estimated equations are identical
coefficients are concerned. This is a special case

one minus the original
equations. These suggest
as far as the estimated
of a more general result

related to arbitrary linear combinations of the x,,s subtracted from both
sides of the statistical GM. In order to see this let us subtract y’x, from both

sides of the statistical GM:

,—yX,=(Ệ — yx, tu,

or

ye = Px,

(19.52)

Tu,

in an obvious notation. It is easy to see that the non-systematic component
as well as o* remain unchanged. Moreover, in view of the equality
where

ñ* =ñ,

(19.53)

ñ*=y"—Xƒ*,

*=(XX) 'Xy*=f-y,

we can deduce that

v=

"
T-k

T-k

vu.

(19.54)

On the other hand, R? is not invariant to this transformation because

2x42

wd

33

As we can see, the R? of the ‘detrended’ dependent variable equation 1s less
than half of the original. This confirms the suggestion that the trend in p,
contributes significantly to the high value of the original R?. It is important
to note at this stage that trending data series can be a problem when the

asymptotic properties of the MLE’s are used uncritically (see sub-section (4)

below).

(3)

Properties of the MLE 6=(B, a7) — finite sample

In order to decide whether the MLE 6 is a ‘good’ estimator of @ we need to

consider its properties. The finite sample properties (see Chapters
13) will be considered first and then the asymptotic properties.
(1)

6 being a MLE satisfies certain properties by definition:

12 and

For a Borel function h(-) the MLE of h() is h(6). For example, the

MLE of log(f’B) is log(f’f).

19.4

(2)

Estimation

387

If a minimal sufficient statistic t(y) exists, then 6 must be a function
of it.
Using the Lehmann-—Scheffe theorem (see Chapter 12) we can deduce that

the values of y, for which the ratio

D(y/X: 0)
=
D(yo/X; 8) 7

(2n0?)-™ =p|~; 1:t~X# ly Xp)
(2mø?)~*? ep

1

—>_—z(Yo—X#(Yo—
xp!
2ø?

(19.56)

is independent of @, are yoyo>=y’y and X’y,=X’y. Hence, the minimal
sufficient statistic is t(y)=(t,(y), t(y))=(y’y. X’y) and B=(X'X)~!1,(y),
6? =(1/T)(t,(y) —1'x(y)(X’X)~ 't2(y)) are indeed functions of t(y).

In order to discuss any other properties of the MLE 6 of 6 we need to

derive the sampling distribution of 6. Given that B and 6? are independent
we can consider them separately.
The distribution of B
Ậ=(XX)- IX'y=Ly,

(19.57)

where L =(X’X)~!X’ is a kx T matrix of known constants. That is, B is a
linear function of the normally distributed random vector y. Hence

B~ N(LXB,o7LL’)
or

from N1, Chapter 15,

Br N(B, 07(X'X)~').

(19.58)

From the sampling distribution (58) we can deduce the following properties
for p:
(3(i))

fis an unbiased estimator of B since E(p)=, i.e. the sampling

(4(i))

distribution of B has mean equal to ổ.

Bisa fully efficient estimator of B since Cov(p)=07(X’X)
“+, ie.
Cov(B) achieves the Cramer—Rao lower bound; see (30) above.

The distribution of 6?
P=

1

2
~
I
1
(y— XB iy —XB=— Wâ== uMụu,

where M,=I—P,.

()

From (Q2) of Chapter

~ (tr M,),

(19.59)

15 we can deduce that

(19.60)

388

Specification, estimation and testing

where tr M, refers to the trace of M, (trA=)"_, a;;, Ai nxn),
trM,=trI-trX(XX) 'X'
=T-tr(XX) '\XX)

Gince tr(A+B)=tr A+tr B)
(since tr(AB)=tr(BA))

=T—k.

Hence, we can deduce that
T 22

(=Ø )xeư-

(19.61)

Intuitively we can explain this result as saying that (u’M,u)/o? represents
the summation of the squares of T—k independent standard normal

components.

Using (61) we can deduce that

T2? )=T-t
t5
PB

and var(

To?
53
a

}=3=k

(see Appendix 6.1). These results imply that

T—k 0? #0".
EG?)=——

2(T—k)?ø*
Var(23)=^U2(T~k)
„2-9
ø
Tả
“Co g4 a
That is:

(3(ii))

(4(ii))

- Cramer-Rao

lower bound.

6? is a biased estimator of o?: and

6? is not a fully efficient estimator of 02.

However, 3(1i1) implies that for
1
ữa
8 2_ x_x„ hũ

(19.62)
19.62

2

(T-K) 3~ (7-8)
and

E(S?)=o7,

Var(s?)=(20+)(T—k)>(20%)/T

(19.63)
— Cramer-Rao

bound.

That is, s? is an unbiased estimator of c?, although it does not quite achieve

the Cramer-Rao lower bound given by the information matrix (30) above.
It turns out, however, that no other unbiased estimator of o? achieves that
bound and among such estimators s? has minimum variance. In statistical
inference relating to the linear regression model s? is preferred to 67 as an

estimator of a7.
The sampling

distributions of the estimators

and

s* involve the

THE LINEAR REGRESSION MODEL I

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về