THE LINEAR REGRESSION MODEL III

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.77 MB, 50 trang )

CHAPTER

21

The linear regression model III — departures
from the assumptions underlying the
probability model
The purpose of this chapter is to consider various forms of departures from
the assumptions of the probability model:
[6]
(i)
D(y,/X,; 8) is normal,

E71

(1)
E(y,/X,= x,)= B’x,, linear in x,,
(iti)
Var(y,/X,=x,)=07, homoskedastic,
0 =(ÿ;ø?) are time-invarlant.

In each of the Sections 2—5 the above assumptions will be relaxed one at a
time, retaining the others, and the following interrelated questions will be

discussed:
(a)
(b)

what are the implications of the departures considered?
how do we detect such departures?, and

(c)
how do we proceed if departures are detected?
It is important to note at the outset that the following discussion which

considers individual assumptions being relaxed separately limits the scope
of misspecification analysis because it is rather rate to encounter such
conditions in practice. More often than not various assumptions are invalid
simultaneously. This is considered in more detail in Section 1. Section 6

discusses the problem of structural change which constitutes a particularly
important form of departure from [7].

211

Misspecification testing and auxiliary regressions

Misspecification testing refers to the testing of the assumptions underlying
a Statistical model. In its context the null hypothesis is uniquely defined as

the assumption(s) in question being valid. The alternative takes a particular
form of departure from the null which is invariably non-unique. This is
443

444

Departures from assumptions

- probability model

because departures from a given assumption can take numerous forms with
the specified alternative being only one such form. Moreover, most
misspecification tests are based on the questionable presupposition that the
other assumptions of the model are valid. This is because joint
misspecification testing is considerably more involved. For these reasons
the choice in a misspecification test is between rejecting and not rejecting
the null; accepting the alternative should be excluded at this stage.
An important implication for the question on how to proceed if the null is
rejected is that before any action is taken the results of the other
misspecification tests should also be considered. It is often the case that a
particular form of departure from one assumption might also affect other
assumptions. For example when the assumption of sample independence
[8] is invalid the other misspecification tests are influenced (see Chapter 22).
In general the way to proceed when any of the assumptions [6]-[8] are

invalid is first to narrow down the source of the departures by relating them

back to the NIID assumption of ,Z,,t¢ 7} and then respecify the model
taking into account the departure from NIID. The respecification of the
model involves a reconsideration of the reduction from D(Z,,Z5,...,Z7; W)

to D(y,/X,; 8) so as to account for the departures from the assumptions
involved. As argued in Chapters 19-20 this reduction coming in the form of:

r

D(Z4.....Z¡:)= [[ ĐưZ: ý)

(21.1)

ted
T

= IT

Dy,/X,.

01)

DIX

2)

(21.2)

t=

involves the independence and the identically distributed assumptions in
(1). The normality assumption plays an important role in defining the
parametrisation of interest @=(B,o*) as well as the weak exogeneity
condition. Once the source of the detected departure is related to one or
more of the NIID assumptions the respecification takes the form of an
alternative form of reduction. This is illustrated most vividly in Chapter 22
where assumption [8] is discussed. It turns out that when [8] is invalid not
only the results in Chapter 19 are invalid but the other misspecification tests
are ‘largely’ inappropriate as well. For this reason it is advisable in practice
to test assumption [8] first and then proceed with the other assumptions if
[8] is not reyected. The sequence of misspecification tests considered in what
follows is chosen only for expositional purposes.
With the above discussion in mind let us consider the question of general

procedures for the derivation of misspecification tests. In cases where the

alternative in a misspecification test is given a specific parametric form the

various procedures encountered in specification testing (F-type tests, Wald,

21.1

Misspecification testing

445

Lagrange multiplier and likelihood ratio) can be easily adapted to apply in
the present context. In addition to these procedures several specific
misspecification test procedures have been proposed in the literature (see
White (1982), Bierens (1982), inter alia). Of particular interest in the present

book are the procedures based on the ‘omitted variables’ argument which
lead to auxiliary regressions (see Ramsey (1969), (1974), Pagan and Hall
(1983), Pagan (1984), inter alia). This particular procedure is given a
prominent role in what follows because it is easy to implement in practice
and
it provides
a common-sense
interpretation
of most
other
misspecification tests.
The ‘omitted variables’ argument was criticised in Section 20.2 because it

was based on the comparison of two ‘non-comparable’ statistical GM’s.
This was because the information sets underlying the latter were different. It
was argued, however, that the argument could be reformulated by
postulating the same sample information sets. In particular if both
parametrisations can be derived from D(Z,. Z>,. .... Zr; w) by using
alternative reduction arguments then the two statistical GM’s can be made
comparable.
Let {Z,, f¢ 1} be a vector stochastic process defined on the probability
space (S, ~ P(-)) which includes the stochastic variables of interest. In
Chapter 17 it was argued that for a given %Co.F
y= EO,/GZ)+y,,

tel

defines a general statistical GM
M=EN/YG).

with

u,=w,—EU//2/)

satisfying some desirable
orthogonality condition:
E(uju,)=0,

(21.3)

properties

(21.4)

by

te.

construction

including

the
(21.5)

Itis important to note, however, that (3)(4) as defined above are just ‘empty

boxes’. These are filled when

{Z,,t¢€ 1}

is given a specific probabilistic

structure such as NIID. In the latter case (3)-(4) take the specific forms:
1,=fx,+u,

HƑ=x,

and

cel

u*=y,—f'x,,

(21.6)

(21.7)

with the conditioning information set being

Y='X,=x,).
When

(21.8)

any of the assumptions in NIID are invalid, however, the various

properties of u, and u, no longer hold for u* and u*. In particular the

446

Departures from assumptions — probability model

orthogonality condition (5) is invalid. The non-orthogonality

E(uxus) 40,

teT

(21.9)

can be used to derive various misspecification tests. If we specify the

alternative in a parametric form which includes the null as a special case (9)
could be used to derive misspecification tests based on certain auxiliary
regressions.
In order to illustrate this procedure let us consider two important
parametric forms which can provide the basis of several misspecification

tests:

(a)

g*(x,)= Yo plu)’

(b)

9(Xx)=a+

i=]

), bxu+

i=l

kok
+)

k

x

i=1j2t

(21.0)

kok
}

3, Cụ Xu Xu

i=1j>1

ni

I>dj

an,

(21.11)

The polynomial g*(x,) is related to RESET

type tests (see Ramsey (1969)

and g(x,) is known as the Kolmogorov—Gabor polynomial (see Ivakhnenko

(1984)). Both of these polynomials can be used to specify
parametric form for the alternative systematic component:

Hy = BoX, +702

a general

(21.12)

where z* represents known functions of the variables Z,_;,...,Z,,X,. This
gives rise to the alternative statistical GM

V,= Box,
+ you*e +e,,

teT

(21.13)

which includes (6) as a special case under

Hg:yo¿=0,
A direct

regression

with H,:yaz0.

comparison

between

(13) and

Uy = (Bo ~ BYX, + Your + &,

(21.14)
(6) gives

rise to the auxiliary

(21.15)

whose operational form

U, = (Bo — BYX, + Yous + &,

(21.16)

can be used to test (14) directly. The most obvious test is the F-type test
discussed in Sections 19.5 and 20.3. The F-test will take the general form
RRSS—URSS

FT) =

URSS

(T—k*
(=)
m

(21.17)

21.2

Normality

447

where RRSS and URSS refer to the residuals sum of squares from (6) and
(16) (or (13)), respectively; k* being the number of parameters in (13) and m
the number of restrictions.
This procedure could be easily extended to the higher central moments of

VN

E(ul/X,=x,),

r>2.

(21.18)

For further discussion see Spanos (1985b).

21.2

Normality

As argued above, the assumptions underlying the probability model are all
interrelated and they stem from the fact that D(y,,X,; ý) is assumed to be
multivariate normal. When D(y,.X,;W) is assumed to be some other
multivariate distribution the regression function takes a more general form
(not necessarily linear),

E(y,/X, = X,) = Aly, x,),

(21.19)

and the skedasticity function is not necessarily free of x,,
Var(y,/X,= x,)=0(Ú. X,).

(21.20)

Several examples of regression and skedasticity functions in the bivariate
case were considered in Chapter 7. In this section, however, we are going to
consider relaxing the assumption of normality only, keeping linearity and
homoskedasticity. In particular we will consider the consequences of
assuming
(¥,/X,=X,) ~ D(B’x,, 07),

(21.2 1)

where D(-) is an unknown distribution, and discuss the problem of testing
whether D(-) is in fact normal or not.
(1)

Consequences of non-normality

Let us consider the effect of the non-normality assumption in (21) on the

specification, estimation and testing in the context of the linear regression
model discussed in Chapter 19.
As far as specification (see Section 19.2) is concerned only marginal
changes are needed. After removing assumption
[6](i) the other

assumptions can be reinterpreted in terms of D(f’x,, o”). This suggests that
relaxing normality but retaining linearity and homoskedasticity might not

constitute a major break from the linear regression framework.

448

Departures from assumptions — probability model

The first casualty of (21) as far as estimation (see Section 19.4) is
concerned is the method of maximum likelihood itself which cannot be used
unless the form of D(-) is known. We could, however, use the least-squares
method of estimation briefly discussed in Section 13.1, where the form of the

underlying distribution is ‘apparently’ not needed.
Least-squares is an alternative method of estimation which is historically
much older than the maximum likelihood or the method of moments. The
least-squares method estimates the unknown parameters 6 by minimising
the squares of the distance between the observable random variables y,,
te T, and h,(@) (a function of @ purporting to approximate the mechanism
giving rise to the observed values y,), weighted by a precision factor I/x,
which ts assumed known, i.e.

ny (S”)

0cQ

¢

(21.22)

Ky

It is interesting to note that this method was first suggested by Gauss in 1794
as an alternative to maximising what we, nowadays, call the log-likelihood
function under the normality assumption (see Section 13.1 for more details).
In an attempt to motivate the least-squares method he argued that:
the most probable value of the desired parameters will be that in which the
sum of the squares of differences between the actually observed and
computed values multiplied by numbers that measure the degree of
precision, is a minimum ....

This clearly shows a direct relationship between the normality assumption
and the least-squares method of estimation. It can be argued, however, that
the least-squares method can be applied to estimation problems without
assuming normality. In relation to such an argument Pearson (1920)
warned that:
we can only assert that the least-squares methods are theoretically
accurate on the assumption that our observations ... obey the normal law.
. Hence in disregarding normal distributions and claiming great
generality ... by merely using the principle of least-squares ... the
apparent generalisation has been gained merely at the expense of
theoretical validity ....

Despite this forceful argument let us consider the estimation of the linear
regression model without assuming normality, but retaining linearity and
homoskedasticity as in (21).
The least-squares method suggests minimising
l8) =

T

1=1

, _—

#/

(PX)
ỡ

2

,

(21.23)

21.2

Normality

449

or, equivalently:

IØ)= > (y,—x)?=(y—X#)(y—Xỹ),
t=1

al
apt
7X 2X(y—-Xÿ)=0
=X#)=0,

(21.24)
,

(21.25)

Solving the system of normal equations (25) (assuming that rank(X)=k) we
get the ordinary least-squares (OLS) estimator of B
b=(X’X) 1 X’y.
The OLS

(21.26)

estimator of o? is

1
§?
Xb
Ps
Ib)==—1 (y—Xb)(y
— Xb Xb).

21.27
(21.27)

Let us consider the properties nha OLS estimators b and $? in view of the

fact that the form of D(f’x,,
a7) is not known.
Finite sample properties of b and $?
Although b is identical to B (the MLE of ) the similarity does not extend to
the properties unless D(y,/X,; 8) is normal.
(a)
Since b=Ly, the OLS estimator is linear in y.
Using the properties of the expectation operator E(-) we can deduce:
(b)
E(b) = E(b + Lu) = B+ LE(u) = £,i-e. bis an unbiased estimator of B.

(c)

E(b— B)(b — 8 = E(LuwL)= ø?LU =ø?(XX)_ !

Given that we have the mean and variance of b but not its distribution, what
other properties can we deduce?
Clearly, we cannot say anything about sufficiency or full efficiency
without knowing D(y,/X,; 6) but hopefully we could discuss relative
efficiency within the class of estimators satisfying (a) and (b). The Gauss—
Markov theorem provides us with such a result.
Gauss—Markov

theorem

Under the assumption (21), b, the OLS estimator of B, has minimum variance
among the class of linear and unbiased estimators (for a proof see Judge et
al, (1982)).

(d)

As far as §? is concerned,

we can show

that

E(8?)= 07, i.e. §? is an-unbiased estimator of 07,

using only the properties of the expectation operator relative to D(p’x,, ¢”).
In order to test any hypotheses or set up confidence intervals for

450

Departures from assumptions — probability model

0 =(ÿ, ø?) we need the distribution of the OLS estimators b and §*. Thus,
unless we specify the form of D(f’x,,¢7), no test or/and confidence interval

statistics can be derived. The question which naturally arises is to what

extent ‘asymptotic theory’ can at least provide us with large sample results.

Asymptotic distribution of b and §?
Lemma 21.1
Under assumption (21),

i

v/T(b—8) ~ N(0.ø?Qy')

(21.28)

lim (=) =Q,

(21.29)

is finite and non-singular.
Lemma 21.2
Under (21) we can deduce that

\/T(s? —02) ~ v(0, É;- ')»)
#

(21.30)

Ớa

where {14 refers to the fourth central moment of D(y,/X,; 6) assumed
to be finite (see Schmidt (1976)).
Note that in the case where D(y,/X,; Ø) is normal

2131)

mi =3=V/ TẾ? =ø°) ~ N(O,204)
Lemma

21.3

Under (21)

and

p
b—

(21.32)

( lim; „(XX)=0)

(21.33)

P

(21.34)

$— Ø2.
From

the

above

lemmas

we

can

see

that

although

the

asymptotic

distribution of b coincides with the asymptotic distribution of the MLE this
is not the case with §*. The asymptotic distribution ofb does not depend on

21.2

Normality

451

D(y,/X,; 8) but that of $? does via 4. The question which naturally arises is
to what extent the various results related to tests about 0=(B, a) (see
Section 19.5) are at least asymptotically justifiable. Let us consider the Ftest for Hy: RB=r against H,: RBƠÂr. From lemma 21.1 we can deduce that

under Hy: ,/T(Rb—r)~ N(0,o?(RQ;'R’)~'), which implies that
,

(Rb —r) aon

=1R/71—1

0

(Rb—r) ^ x(m).

(21.35)

Using this result in conjunction with lemma 21.3 we can deduce that

R(XX) !R1]-!
1
tr(y)= (Rb —r) a
(Rb—r) ~ m z'(m)

(21.36)

under Ho, and thus the F-test is robust with respect to the non-normality

assumption (21) above. Although the asymptotic distribution of t,(y) is chisquare, in practice the F-distribution provides a better approximation fora
small T (see Section 19.5)). This is particularly true when D(f’x,,o) has
heavy tails. The significance t-test being a special case of the F-test,
b.

(y= ae

Viwwn

~ N(O, 1)

9,9

under H,:

oP

8,=0

21.37

ere)

is also asymptotically justifiable and robust relative to the non-normality
assumption (21) above.
Because of lemma 21.2 intuition suggests that the testing results in
relation to c? will not be robust relative to the non-normality assumption.

Given that the asymptotic distribution of §? depends on „ or #¿= /u„/ø' the
kurtosis coefficient, any departures from normality (where «,=3) will
seriously affect the results based on the normality assumption. In particular

the size « and power of these tests can be very different from the ones based
on the postulated value of «. This can seriously affect all tests which depend
on the distribution of s* such as some heteroskedasticity and structural
change tests (see Sections 21.4-21.6 below). In order to get non-normality
robust tests in such cases we need to modify them to take account of ji,.

(2)

Testing for departures from normality

Tests for normality can be divided into parametric and non-parametric

tests depending on whether the alternative is given a parametric form or
not.

452

Departures from assumptions — probability model

(a)

Non-parametric tests
The Kolmogorov—Smirnov test

Based on the assumption that {u,/X,,t¢ 7} isan IID process we can use the
results of Appendix

1i.1 to construct test with rejection region

C,={y:./T Dt>c,}

(21.38)

where D* refers to the Kolmogorov-Smirnov test statistic in terms of the
residuals. Typical values of c, are:

a
€

O01
123

„

05
1.36

O01
1.67)

(21.39)

For a most illuminating discussion of this and similar tests see Durbin

(1973).

The Shapiro—Wilk test
This test is based on the ratio of two different estimators of the variance ơ?.
n

z=|

>

t=

2

1

Aer (Urey 1)

where ti1) n=

T

T

| ), ap

(21.40)

it =1

are the ordered residuals,

if T iseven

or

T-1
n=——

if T is odd,

and a,; is a weight coefficient tabulated by Shapiro and Wilk (1965) for

sample sizes 2< 7< 50. The rejection region takes the form:

Cry=ty: W
(21.41)

where c, are tabulated in the above paper.

(b)

Parametric tests
The skewness—kurtosis test

The most widely used parametric test for normality is the skewness—
kurtosis. The parametric alternative in this test comes in the form of the
Pearson family of densities.
The Pearson family of distributions is based on the differential equation
din f(z)
(z—a)
——~=—————_—

đz

cạ+€¡z+e;z?`

21.42

)

21.2

Normality

453

where solution for different values of (a, co, Cc}, C2) generates a large number
of interesting distributions such as the gamma, beta and Student’s r. It can

be shown that knowledge of a”, «3; and a, can be used to determine the
distribution of Z within the Pearson family. In particular:

(21.43)

a=C, =(44+3)(a3)*0,
Co = (404 —3a3)07/d,

cạ=(2¿—3w¿—6)/4,

(21.44)

(21.45)

d=(10a,— 1203-18)

(see Kendall and Stuart (1969)). These parameters can be easily estimated
using G,@ 3 and %, and then used to give us some idea about the nature of the
departure from non-normality. Such information will be of considerable
interest in tackling non-normality (see subsection (3)). In the case of
normality c, =c,=0 => a;=0,a,=3. Departures from normality within the
Pearson family of particular interest are the following cases:
(a)

c,=0,c, #0. This gives rise to gamma-type distributions with the
chi-square an important member of this class of distributions. For
3

TNHÌÌ

Z~zư),

?
1

ay=3+

12

m>1.

(21.46)

(b)

c,=0, cg>0, c,>0. An important member of this class of
distributions is the Student's t. For Z~t(m), a,=0, #,=3+

{c)

c,directly related to the chi-square and F-distributions. In particular

6/(m—4), (m>4).

if Z;~ x7(m,), i= 1,2, and Z,,Z, are independent, then

Z“(z2z;)~*(5:
3)
2:

_

m,
~Bl—_

0140

M)
2

1.47

where B(m,/2, m,/2) denotes the beta distribution with parameters

m,/2 and m;/2.
As argued above normality within the Pearson family is characterised by

sy=(w;/ø3)=0

and

a4=(p4/o*)=3.

(2148)

It is interesting to note that (48) also characterises normality within the
‘short’ (first four moments) Gram—Charlier expansion:

g(z)=[1 —4a3(23 —3z) + (a4 — 3)(z* — 6z? + 3)]đ®(z)
(see Section 10.6).
Bera and Jarque

(1982) using

the Pearson

family

(21.49)

as the parametric

alternative derived the following skewness—kurtosis test as a Lagrange

454

Departures from assumptions — probability model

multiplier test:

tine] ater aa? | 20

(21.50)

a-|(7

(21.51)

where

y ?)(š š i) |

«l0 š5)0 53]
TS

os

TX,

The rejection region is defined by

Cy =ty: THY) > Cy},

(21.53)

|, dy7(2) =a.
fa

A less formal derivation of the test can
distributions of &3 and &,:

be based

on

the asymptotic

Ho

/T a; ~ N(0,6)

(21.54)

v/T(á,— 3) ~ N(0,24).

(21.55)

a

With đ; and đ¿ being asymptotically independent (see Kendall and Stuart

(1969)) we can add the squares of their standardised forms to derive (50); see
Section 6.3.
Let us consider the skewness—kurtosis test for the money equation

m, = 2.896 +0.690y, + 0.865p, —0.055i, + ñ,,
(1.034) (0.105) (0.020) (0.013) (0.039)
R?=0.995, R?=0.995, s=0.0393, log L=1474,
T=80, &2=0.005, (&@,—3)?=0.145.

(21.56)

Thus, t%(y) =0.55 and since c,=5.99 for «=0.5 we can deduce that under

the assumption that the other assumptions underlying the linear regression
model are valid the null hypothesis Hy: #,=0 and a, =3 is not rejected for

œ=0.05.
There are several things to note about the above skewness—kurtosis test.
Firstly, it is an asymptotic test and caution should be exercised when the
sample size T is small: For higher-order approximations of the finite sample
distribution of &, and a, see Pearson, D’Agostino and Bowman (1977),
Bowman and Shenton (1975), inter alia. Secondly, the test is sensitive to
‘outliers’ (‘unusually large’ deviations). This can be both a blessing and a

21.2

Normality

455

hindrance. The first reaction of a practitioner whose residuals fail this
normality test is to look for such outliers. When the apparent nonnormality can be explained by the presence of these outliers the problem
can be solved when the presence of the outliers can itself be explained.

Otherwise,

alternative

forms

of tackling

non-normality

need

to

be

considered as discussed below. Thirdly, in the case where the standard error
of the regression ¢ is relatively large (because very little of the variation in y,
is actually explained), it can dominate the test statistic r#(y). It will be
suggested in Chapter 23 that the acceptance of normality in the case of the
money equation above is largely due to this. Fourthly, rejection of
normality using the skewness—kurtosis test gives us no information as to the
jnature of the departures from normality unless it is due to the presence of
outliers.
A natural way to extend the skewness-kurtosis test is to include
cumulants of order higher than four which are zero under normality (see
Appendix 6.1).

(3)

Tackling non-normality

When the normality assumption is invalid there are two possible ways to
proceed. One is to postulate a more appropriate distribution for D( y,/X,; 9)
and respecify the linear regression model accordingly. This option is rarely
considered, however, because most of the results in this context are

developed under the normality assumption. For this reason the second way

to proceed,

based

on

normalising

transformations,

is by far the most

commonly used way to tackle non-normality. This approach amounts to
applying
Because

a transformation to y, or/and X, so as to induce normality.
of the
relationship
between
normality,
linearity
and

homoskedasticity these transformations commonly induce linearity and
homoskedasticity as well.

One of the most interesting family of transformations in this context is
the Box—Cox (1964) transformation. For an arbitrary random variable Z
the Box—Cox transformation takes the form

z=!
zt ==

O
(21.57)

Of particular interest are the three cases:

(i)

6=-1,

Z*=Z~!

(ii)

6=0.5,

Z*=(Z)? — square root;

— reciprocal;

(21.58)

(21.59)

456

(iii)

Departures from assumptions — probability model

6=0,

Z*=log, Z — logarithmic

(21.60)

(note: lim Z* = log, Z).
670

The first two

cases are not commonly

used in econometric

modelling

because of the difficulties involved in interpreting Z* in the context of an

empirical
econometric
model.

Often,
transformation might be convenient as

transformation.

This

is because

certain

however,
the
square-root
a homoskedasticity inducing

economic

time-series

exhibit

variances which change with its trending mean (m,), i.e. Var(Z,) = m,o?, t= 1.

2,...,

T. In such cases the square-root

transformation

can be used as a

variance-stabilising one (see Appendix 21.1) since Var(Z*)~o?.
The logarithmic
trari8formation
is of considerable
interest in
econometric modelling for a variety of reasons. Firstly, for a random
variable Z, whose distribution is closer to the log normal, gamma or chisquare (i.e. positively skewed), the distribution of log, Z, is approximately
normal (see Johnson and Kotz (1970)). The log, transformation induces
‘near symmetry’ to the original skewed distribution and allows Z* to take

negative values even though Z could not. For economic data which take

only positive values this can be a useful transformation to achieve near

normality. Secondly, the log, transformation can be used as a variance-

stabilising transformation in the case where the heteroskedasticity takes the
form

Var(y,/X,=x,)=o2=(u,)'02,
For y*=log,y,

Var(y*/X,=x,)=ø?,

t=1,2....,T,

(21.61)

t=1, 2, ...,

T. Thirdly, the log

transformation can be used to define useful economic concepts such as
elasticities and growth rates. For example, in the case of the money
equation considered above the variables are all in logarithmic form and the
estimated coefficients can be interpreted as elasticities (assuming that the
estimated equation constitutes a well-defined statistical model; a doubtful
assumption). Moreover, the growth rate of Z, defined by Z* =(Z,—Z, _ U/

Z,-, can be approximated
Alog.Z,~log(1+Z,)~Z,".

by

Alog,Z,=log,Z,—log Z,_,

because

In practice the Box—Cox transformation can be used with 5 unspecified
and let the data determine its value (see Zarembka (1974)). For the money
equation the original variables M,, Y,, P, and I, were used in the Box-Cox
transformed equation:

(

M?-1
3

)5,

+8(

Y?-1

Pe—1

(2

(

lo
:

}*u (21.62)

21.3

Linearity

457

and allowed the data to determine the value of 6. The estimated 6 value
chosen

was 6=0.530

B, =0.252,

(0.223)

and

;=0.865,
(0.119)

;=0.005, ổ¿= —0.00007.
(0.0001)
(0.000 02)

that the original logarithmic transformation is
‘Does this mean
inappropriate” The answer is, not necessarily. This is because the estimated

value of 5 depends on the estimated equation being a well-defined statistical
GM (no misspecification). In the money equation example there is enough
evidence to suggest that various forms of misspecification are indeed
present (see also Sections 21.3-7 and Chapter 22).
The alternative way to tackle non-linearity by postulating a more
appropriate form for the distribution of Z, remains largely unexplored.

Most of the results in this direction are limited to multivariate distributions

closely related to the normal such as the elliptical family of distributions (see
Section 21.3 below). On the question of robust estimation see Amemiya

(1985).

21.3

Linearity

As argued above, the assumption
(21.63)

E(y,/X,=X,) = BX,,

where B=L3,'6,, can be viewed as a consequence of the assumption that
Z,~ N(0,Z), te T (Z, is a normal IID sequence of r.v.’s). The form of (63) 15
not as restrictive as it seems at first sight because E(y,/X* = x**) can be non-

linear in x* but linear in x,=/(x*) where [(-) is a well-behaved
transformation such as x,=log x* and x,= Oc)? Moreover, terms such as

Cot cyt tent? +7 +Â,Â"
and

h

2n

ist

Cj

_

(2n

Cot Ơ { a, cos{ — }t+y; sin +

i

Ít}

(21.64)

purporting to model a time trend and seasonal effects respectively, can be
easily accommodated as part of the constant. This can be justified in the
context of the above analysis by extending Z,~ N(0,2), te 17, to Z,~
Nim,, Z),t € T, being an independent sequence of random vectors where the
mean isa function of time and the covariance matrix is the same forall te T.
The sequence of random vectors {Z,,t¢€ 1} in this case constitutes a nonstationary sequence (see Section 21.5 below). The non-linearities of interest
in this section are the ones which cannot be accommodated into a linear
conditional mean after transformation.

458

Departures from assumptions — probability model

It is important to note that postulating (63), without assuming normality

of D(y,,X,; w), we limit the class of symmetric

distributions in which

D(y,,X%,; w) could belong to that of elliptical distributions, denoted by
EL(u, 2) (see Kelker (1970)). These distributions provide an extension of the

multivariate normal distribution which preserve its bell-like shape and
symmetry. Assuming that

(x) (ole £2)

ass

E(y,/X,=X,) =o, 227 X;

(21.66)

Vy

O\(o1,

đa

implies that
and

Var(y,/X,=X,)=g(X,)(Ø¡

This shows that the assumption

— Øi 227).

(21.67)

of linearity is not as sensitive to some

departures from normality as the homoskedasticity assumption. Indeed,
homoskedasticity of the conditional variance characterises the normal
distribution within the class of elliptical distributions (see Chmielewski

(198 1)).

(1)

Implications of non-linearity

Let us consider the implications of non-linearity for the results of Chapter
19 related to the estimation, testing and prediction in the context of the

linear regression model. In particular, ‘what
assuming that D(Z,; w) is not normal and

are the implications

E(y,/X,= X,) =h(X,),

(21.68)

where h(x,)4 B’x,””
In Chapter

defined to be

19 the statistical GM

for the linear regression model

Vp = BX, +4,

however, is

and

EF(u*?/X,=x,)=øơ”.

The

with E(u,/X,=x,)=0,

‘true’ statistical GM,

(21.70)

¥, =Alx,) + &,
where

and

white

y,=E(y,/X,=x,) =h(x,)

(70) we

noise

can

but

was

(21.69)

thinking that uf = E(y,/X,=x,) = Ÿx, and u* = y,— u*

E(uxu*/X,=x,)=0

of

see that

and

¢,=y,—E(y,/X,=x,).

the error

term

Comparing

in the former

u,=y,—f’x,=h(x,)—B’x,+¢,=g(x,)
+e,

is no

(69)

longer

Moreover,

21.3

Linearity

459

E(ur/X,=x,)=9(X,), E(u#u#) #0 and
E(u? /X,=x,)=9(X,)" + 07.

(21.71)

In view of these properties of u, we can deduce that for

e=(g(X¡),ø(X¿)..... Ø(Xr)),
and

(21.72)

(21.72)

E(B) = B+(X’X) 'Xez8,
, M,
2
2
T0
E(s“)=ø“+e

2

,

,
M.=I-X(XX)

—1XVÃ:

'X,

(21.73)

because y=Xạ+e+e not y=X+u.
Moreover, 8 and s2 are also
inconsistent estimators of ổ and øŸ? unless the approximation error e
satisfies (1/T)X’e > 0 and (1/T)e’M,e > 0 as T— & respectively. That is,
unless h(x,) is not ‘too’ non-linear and the non-linearity decreases with T, B

and s* are inconsistent estimators of B and o?.

As we can see, the consequences of non-linearity are quite serious as far as
the properties of B and s* are concerned, being biased and inconsistent

estimators of B and o’, in general. What is more, the testing and prediction
results derived in Chapter 19 are generally invalid in the case of nonlinearity. In view of this the question arises as to what is it we are estimating

by s? and B in (70)?

Given that u,=(h(x,)
— B’x,) +6, we can think of
as an estimator of p*
where B* is the parameter which minimises the mean square error of u,, i.e.

B* =min o*(B)
8

where o7(B) =E(u7).

(21.74)

This is because [¢o7(B)]/CB =(—2)E[(h(x,) — B’x,)x/] =0 (assuming that we
can
differentiate
inside
the expectation
operator).
Hence,
p*=

E(x,x,) | E(A(x,)x))
= 23,'¢>,, say. Moreover, s? can be viewed as the
natural estimator of o?(p*). That is, B and s? are the natural estimators

of a least-squares approximation B*’x, to the unknown function h(x,)
and the least-squares approximation-error respectively. What is more,

we can show that p— Br and s* ->ø2(§*) (see White (1980)).
(2)

Testing for non-linearity

In view of the serious implications of non-linearity for the results of Chapter

19 it is important to be able to test for departures from the linearity

assumption. In particular we need to construct tests for

Ho: Ely,/X,=X,)= Bx,

(21.75)

460
against

Departures from assumptions — probability model

Hy: E(y,/X,=X,) =h(x,).

(21.76)

This, however, raises the question of postulating a particular functional
form for h(x,) which is not available unless we are prepared to assume a

particular
form
for
D(Z,;~).
Alternatively,
we
could
use
the
parametrisation related to the Kolmogorov—-Gabor and systematic
component polynomials introduced in Section 21.2.
Using, say, a third-order Kolmogorov-Gabor polynomial (KG(3)) we
can postulate the alternative statistical GM:

1= oXi +22, 3š, +,

(21.77)

where ¿;, includes the second-order terms

XX,

ESj,

LJ=2,3,...,k,

(21.78)

and w;, the third-order terms

xuXyXụ,

Í>j>I,

1j1=2/3,....k.

(2179)

Note that x,, is assumed to be the constant.

Assuming that T is large enough to enable us to estimate (77) we can test
linearity in the form of:
Ho: y2=0

and

y3;=0,

A,:y,40

or

y;,40

using the usual F-type test (see Section 21.1). An asymptotically equivalent
test can be based on the R? of the auxiliary regression:
ti, =(Bo —BYX,+ yoo, + sat

8

(21.80)

using the Lagrange multiplier test statistic
RRSS—URSS`e
ass)
~ 2a)

LM(y)= TR= r(

a

(21.81)

q being the number of restrictions (see Engle (1984)). Its rejection region is

Cy=ty: LM(y)>c,},

|

dy7(q) =a.

For small T the F-type test is preferable in practice because of the degrees of
freedom

adjustment;

see Section

19.5.

Using the polynomial in y, we can postulate the alternative GM
form:

+ v;
+ Cote +C3up +00 Hull
Ve=ByX,

of the

(21.82)

where ju,= B’x,. A direct comparison between (75) and (82) gives rise to a

21.3

Linearity

461

RESET type test (see Ramsey (1974)) for linearity based on Hg: cy=c3=
---=¢,,=0,H,:c;40,i=2,...,m. Again this can be tested using the F-type
test or the LM test both based on the auxiliary regression:
u, =(B, — By x, + » Ciậu +0,,
¡=2

=X,.

(21.83)

Let us apply these tests to the money equation estimated in Section 19.4.

The F-test based on (77) with terms up to third order (but excluding
because of collinearity with y,) yielded:

FT(y)

_0 .117520-0.045
17 520 -0.045 477 ( (67

0.045 477

9

=

Given that c,=2.02 the null hypothesis of linearity is strongly rejected.
Similarly, the RESET type test based on (82) with m=4
because of collinearity with 4i,) yielded:

FT(y)

.117520—0.060 28
Me

0.060 28

(74

25

(excluding i?

)=3513

Again, with c,=3.12 linearity is strongly rejected.

It is important to note that although the RESET type test is based on a

more restrictive form of the alternative (compare (77) with (82)) it might be
the only test available in the case where the degrees of freedom are at a
premium (see Chapter 23).
(3)

Tackling non-linearity

As argued in Section 21.1 the results of the various misspecification tests
should be considered simultaneously because the assumptions are closely

interrelated. For example in the case of the estimated money equation it is
highly likely that the hnearity assumption was rejected because the
independent sample assumption [8] is invalid. In cases, however, where the
source of the departure is indeed the normality assumption (leading to nonlinearity) we need to consider the question of how to proceed by relaxing the

normality of {Z,_,,t¢€1}. One way to proceed from this is to postulate a
general distribution D(y,,X,;w) and derive the specific form of the
conditional expectation

(21.84)

E(y,/X,= X,) = A(x,).
Choosing

the form

of D(y,, X,: ¥) will determine

both

the form

of the

conditional expectation as well as the conditional variance (see Chapter 7).
An alternative way to proceed is to use some normalising transformation

462

Departures from assumptions — probability model

on the original variables y, and X, so as to ensure that the transformed

variables y* and X* are indeed jointly normal and hence

E(
yi /XF =x?) = B*'x?
and

Var(y*/X*=x#*)= ở.

(21.85)
(21.86)

The transformations considered in Section 21.2 in relation to normality are
also directly related to the problem of non-linearity. The Box—Cox

transformation can be used with different values of 6 for each random
variable involved to linearise highly non-linear functional forms. In such a

case the transformed r.v.’s take the general form
o

xp=(=)

i=1,2,...,k

(21.87)

(see Box and Tidwell (1962)).
In practice non-linear regression models are used in conjunction with the
normality of the conditional distribution (see Judge et al. (1985), inter alia).

The question which naturally arises is, ‘how can we reconcile the non-

linearity of the conditional expectation and the normality of D(y,/X,; 0} As
mentioned in Section 19.2, the linearity of u,=E(y,/X,=x,) is a direct
consequence of the normality of the joint distribution D(y,, X,; w). One way
the non-linearity of E(y,/X,=x,) and the normality of D(y,/X,; 0) can be
reconciled is to argue that the conditional distribution is normal in the

transformed variables X*
= h(X,), ie. D(y,/X* =x) linear in x* but non-

linear in x,, Le.

E(y,/X,=X,)

= 9X, y)-

(21.88)

Moreover, the parameters of interest are not the linear regression
parameters 6=(B, 0”) but @=(y, o2). It must be emphasised that nonlinearity in the present context refers to both non-linearity in parameters (y)
and variables (X,).
Non-linear regression models based on the statistical GM:

},=0(X,.}) +,

(21.89)

can be estimated by least-squares based on the minimisation of

,

Sy = È_ Ú,—ø(X,))Ÿ.

(21.90)

THE LINEAR REGRESSION MODEL III

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về