CHAPTER
21
The linear regression model III — departures
from the assumptions underlying the
probability model
The purpose of this chapter is to consider various forms of departures from
the assumptions of the probability model:
[6]
(i)
D(y,/X,; 8) is normal,
E71
(1)
E(y,/X,= x,)= B’x,, linear in x,,
(iti)
Var(y,/X,=x,)=07, homoskedastic,
0 =(ÿ;ø?) are time-invarlant.
In each of the Sections 2—5 the above assumptions will be relaxed one at a
time, retaining the others, and the following interrelated questions will be
discussed:
(a)
(b)
what are the implications of the departures considered?
how do we detect such departures?, and
(c)
how do we proceed if departures are detected?
It is important to note at the outset that the following discussion which
considers individual assumptions being relaxed separately limits the scope
of misspecification analysis because it is rather rate to encounter such
conditions in practice. More often than not various assumptions are invalid
simultaneously. This is considered in more detail in Section 1. Section 6
discusses the problem of structural change which constitutes a particularly
important form of departure from [7].
211
Misspecification testing and auxiliary regressions
Misspecification testing refers to the testing of the assumptions underlying
a Statistical model. In its context the null hypothesis is uniquely defined as
the assumption(s) in question being valid. The alternative takes a particular
form of departure from the null which is invariably non-unique. This is
443
444
Departures from assumptions
- probability model
because departures from a given assumption can take numerous forms with
the specified alternative being only one such form. Moreover, most
misspecification tests are based on the questionable presupposition that the
other assumptions of the model are valid. This is because joint
misspecification testing is considerably more involved. For these reasons
the choice in a misspecification test is between rejecting and not rejecting
the null; accepting the alternative should be excluded at this stage.
An important implication for the question on how to proceed if the null is
rejected is that before any action is taken the results of the other
misspecification tests should also be considered. It is often the case that a
particular form of departure from one assumption might also affect other
assumptions. For example when the assumption of sample independence
[8] is invalid the other misspecification tests are influenced (see Chapter 22).
In general the way to proceed when any of the assumptions [6]-[8] are
invalid is first to narrow down the source of the departures by relating them
back to the NIID assumption of ,Z,,t¢ 7} and then respecify the model
taking into account the departure from NIID. The respecification of the
model involves a reconsideration of the reduction from D(Z,,Z5,...,Z7; W)
to D(y,/X,; 8) so as to account for the departures from the assumptions
involved. As argued in Chapters 19-20 this reduction coming in the form of:
r
D(Z4.....Z¡:)= [[ ĐưZ: ý)
(21.1)
ted
T
= IT
Dy,/X,.
01)
DIX
2)
(21.2)
t=
involves the independence and the identically distributed assumptions in
(1). The normality assumption plays an important role in defining the
parametrisation of interest @=(B,o*) as well as the weak exogeneity
condition. Once the source of the detected departure is related to one or
more of the NIID assumptions the respecification takes the form of an
alternative form of reduction. This is illustrated most vividly in Chapter 22
where assumption [8] is discussed. It turns out that when [8] is invalid not
only the results in Chapter 19 are invalid but the other misspecification tests
are ‘largely’ inappropriate as well. For this reason it is advisable in practice
to test assumption [8] first and then proceed with the other assumptions if
[8] is not reyected. The sequence of misspecification tests considered in what
follows is chosen only for expositional purposes.
With the above discussion in mind let us consider the question of general
procedures for the derivation of misspecification tests. In cases where the
alternative in a misspecification test is given a specific parametric form the
various procedures encountered in specification testing (F-type tests, Wald,
21.1
Misspecification testing
445
Lagrange multiplier and likelihood ratio) can be easily adapted to apply in
the present context. In addition to these procedures several specific
misspecification test procedures have been proposed in the literature (see
White (1982), Bierens (1982), inter alia). Of particular interest in the present
book are the procedures based on the ‘omitted variables’ argument which
lead to auxiliary regressions (see Ramsey (1969), (1974), Pagan and Hall
(1983), Pagan (1984), inter alia). This particular procedure is given a
prominent role in what follows because it is easy to implement in practice
and
it provides
a common-sense
interpretation
of most
other
misspecification tests.
The ‘omitted variables’ argument was criticised in Section 20.2 because it
was based on the comparison of two ‘non-comparable’ statistical GM’s.
This was because the information sets underlying the latter were different. It
was argued, however, that the argument could be reformulated by
postulating the same sample information sets. In particular if both
parametrisations can be derived from D(Z,. Z>,. .... Zr; w) by using
alternative reduction arguments then the two statistical GM’s can be made
comparable.
Let {Z,, f¢ 1} be a vector stochastic process defined on the probability
space (S, ~ P(-)) which includes the stochastic variables of interest. In
Chapter 17 it was argued that for a given %Co.F
y= EO,/GZ)+y,,
tel
defines a general statistical GM
M=EN/YG).
with
u,=w,—EU//2/)
satisfying some desirable
orthogonality condition:
E(uju,)=0,
(21.3)
properties
(21.4)
by
te.
construction
including
the
(21.5)
Itis important to note, however, that (3)(4) as defined above are just ‘empty
boxes’. These are filled when
{Z,,t¢€ 1}
is given a specific probabilistic
structure such as NIID. In the latter case (3)-(4) take the specific forms:
1,=fx,+u,
HƑ=x,
and
cel
u*=y,—f'x,,
(21.6)
(21.7)
with the conditioning information set being
Y='X,=x,).
When
(21.8)
any of the assumptions in NIID are invalid, however, the various
properties of u, and u, no longer hold for u* and u*. In particular the
446
Departures from assumptions — probability model
orthogonality condition (5) is invalid. The non-orthogonality
E(uxus) 40,
teT
(21.9)
can be used to derive various misspecification tests. If we specify the
alternative in a parametric form which includes the null as a special case (9)
could be used to derive misspecification tests based on certain auxiliary
regressions.
In order to illustrate this procedure let us consider two important
parametric forms which can provide the basis of several misspecification
tests:
(a)
g*(x,)= Yo plu)’
(b)
9(Xx)=a+
i=]
), bxu+
i=l
kok
+)
k
x
i=1j2t
(21.0)
kok
}
3, Cụ Xu Xu
i=1j>1
ni
I>dj
an,
(21.11)
The polynomial g*(x,) is related to RESET
type tests (see Ramsey (1969)
and g(x,) is known as the Kolmogorov—Gabor polynomial (see Ivakhnenko
(1984)). Both of these polynomials can be used to specify
parametric form for the alternative systematic component:
Hy = BoX, +702
a general
(21.12)
where z* represents known functions of the variables Z,_;,...,Z,,X,. This
gives rise to the alternative statistical GM
V,= Box,
+ you*e +e,,
teT
(21.13)
which includes (6) as a special case under
Hg:yo¿=0,
A direct
regression
with H,:yaz0.
comparison
between
(13) and
Uy = (Bo ~ BYX, + Your + &,
(21.14)
(6) gives
rise to the auxiliary
(21.15)
whose operational form
U, = (Bo — BYX, + Yous + &,
(21.16)
can be used to test (14) directly. The most obvious test is the F-type test
discussed in Sections 19.5 and 20.3. The F-test will take the general form
RRSS—URSS
FT) =
URSS
(T—k*
(=)
m
(21.17)
21.2
Normality
447
where RRSS and URSS refer to the residuals sum of squares from (6) and
(16) (or (13)), respectively; k* being the number of parameters in (13) and m
the number of restrictions.
This procedure could be easily extended to the higher central moments of
VN
E(ul/X,=x,),
r>2.
(21.18)
For further discussion see Spanos (1985b).
21.2
Normality
As argued above, the assumptions underlying the probability model are all
interrelated and they stem from the fact that D(y,,X,; ý) is assumed to be
multivariate normal. When D(y,.X,;W) is assumed to be some other
multivariate distribution the regression function takes a more general form
(not necessarily linear),
E(y,/X, = X,) = Aly, x,),
(21.19)
and the skedasticity function is not necessarily free of x,,
Var(y,/X,= x,)=0(Ú. X,).
(21.20)
Several examples of regression and skedasticity functions in the bivariate
case were considered in Chapter 7. In this section, however, we are going to
consider relaxing the assumption of normality only, keeping linearity and
homoskedasticity. In particular we will consider the consequences of
assuming
(¥,/X,=X,) ~ D(B’x,, 07),
(21.2 1)
where D(-) is an unknown distribution, and discuss the problem of testing
whether D(-) is in fact normal or not.
(1)
Consequences of non-normality
Let us consider the effect of the non-normality assumption in (21) on the
specification, estimation and testing in the context of the linear regression
model discussed in Chapter 19.
As far as specification (see Section 19.2) is concerned only marginal
changes are needed. After removing assumption
[6](i) the other
assumptions can be reinterpreted in terms of D(f’x,, o”). This suggests that
relaxing normality but retaining linearity and homoskedasticity might not
constitute a major break from the linear regression framework.
448
Departures from assumptions — probability model
The first casualty of (21) as far as estimation (see Section 19.4) is
concerned is the method of maximum likelihood itself which cannot be used
unless the form of D(-) is known. We could, however, use the least-squares
method of estimation briefly discussed in Section 13.1, where the form of the
underlying distribution is ‘apparently’ not needed.
Least-squares is an alternative method of estimation which is historically
much older than the maximum likelihood or the method of moments. The
least-squares method estimates the unknown parameters 6 by minimising
the squares of the distance between the observable random variables y,,
te T, and h,(@) (a function of @ purporting to approximate the mechanism
giving rise to the observed values y,), weighted by a precision factor I/x,
which ts assumed known, i.e.
ny (S”)
0cQ
¢
(21.22)
Ky
It is interesting to note that this method was first suggested by Gauss in 1794
as an alternative to maximising what we, nowadays, call the log-likelihood
function under the normality assumption (see Section 13.1 for more details).
In an attempt to motivate the least-squares method he argued that:
the most probable value of the desired parameters will be that in which the
sum of the squares of differences between the actually observed and
computed values multiplied by numbers that measure the degree of
precision, is a minimum ....
This clearly shows a direct relationship between the normality assumption
and the least-squares method of estimation. It can be argued, however, that
the least-squares method can be applied to estimation problems without
assuming normality. In relation to such an argument Pearson (1920)
warned that:
we can only assert that the least-squares methods are theoretically
accurate on the assumption that our observations ... obey the normal law.
. Hence in disregarding normal distributions and claiming great
generality ... by merely using the principle of least-squares ... the
apparent generalisation has been gained merely at the expense of
theoretical validity ....
Despite this forceful argument let us consider the estimation of the linear
regression model without assuming normality, but retaining linearity and
homoskedasticity as in (21).
The least-squares method suggests minimising
l8) =
T
1=1
, _—
#/
(PX)
ỡ
2
,
(21.23)
21.2
Normality
449
or, equivalently:
IØ)= > (y,—x)?=(y—X#)(y—Xỹ),
t=1
al
apt
7X 2X(y—-Xÿ)=0
=X#)=0,
(21.24)
,
(21.25)
Solving the system of normal equations (25) (assuming that rank(X)=k) we
get the ordinary least-squares (OLS) estimator of B
b=(X’X) 1 X’y.
The OLS
(21.26)
estimator of o? is
1
§?
Xb
Ps
Ib)==—1 (y—Xb)(y
— Xb Xb).
21.27
(21.27)
Let us consider the properties nha OLS estimators b and $? in view of the
fact that the form of D(f’x,,
a7) is not known.
Finite sample properties of b and $?
Although b is identical to B (the MLE of ) the similarity does not extend to
the properties unless D(y,/X,; 8) is normal.
(a)
Since b=Ly, the OLS estimator is linear in y.
Using the properties of the expectation operator E(-) we can deduce:
(b)
E(b) = E(b + Lu) = B+ LE(u) = £,i-e. bis an unbiased estimator of B.
(c)
E(b— B)(b — 8 = E(LuwL)= ø?LU =ø?(XX)_ !
Given that we have the mean and variance of b but not its distribution, what
other properties can we deduce?
Clearly, we cannot say anything about sufficiency or full efficiency
without knowing D(y,/X,; 6) but hopefully we could discuss relative
efficiency within the class of estimators satisfying (a) and (b). The Gauss—
Markov theorem provides us with such a result.
Gauss—Markov
theorem
Under the assumption (21), b, the OLS estimator of B, has minimum variance
among the class of linear and unbiased estimators (for a proof see Judge et
al, (1982)).
(d)
As far as §? is concerned,
we can show
that
E(8?)= 07, i.e. §? is an-unbiased estimator of 07,
using only the properties of the expectation operator relative to D(p’x,, ¢”).
In order to test any hypotheses or set up confidence intervals for
450
Departures from assumptions — probability model
0 =(ÿ, ø?) we need the distribution of the OLS estimators b and §*. Thus,
unless we specify the form of D(f’x,,¢7), no test or/and confidence interval
statistics can be derived. The question which naturally arises is to what
extent ‘asymptotic theory’ can at least provide us with large sample results.
Asymptotic distribution of b and §?
Lemma 21.1
Under assumption (21),
i
v/T(b—8) ~ N(0.ø?Qy')
(21.28)
lim (=) =Q,
(21.29)
is finite and non-singular.
Lemma 21.2
Under (21) we can deduce that
\/T(s? —02) ~ v(0, É;- ')»)
#
(21.30)
Ớa
where {14 refers to the fourth central moment of D(y,/X,; 6) assumed
to be finite (see Schmidt (1976)).
Note that in the case where D(y,/X,; Ø) is normal
2131)
mi =3=V/ TẾ? =ø°) ~ N(O,204)
Lemma
21.3
Under (21)
and
p
b—
(21.32)
( lim; „(XX)=0)
(21.33)
P
(21.34)
$— Ø2.
From
the
above
lemmas
we
can
see
that
although
the
asymptotic
distribution of b coincides with the asymptotic distribution of the MLE this
is not the case with §*. The asymptotic distribution ofb does not depend on
21.2
Normality
451
D(y,/X,; 8) but that of $? does via 4. The question which naturally arises is
to what extent the various results related to tests about 0=(B, a) (see
Section 19.5) are at least asymptotically justifiable. Let us consider the Ftest for Hy: RB=r against H,: RBƠÂr. From lemma 21.1 we can deduce that
under Hy: ,/T(Rb—r)~ N(0,o?(RQ;'R’)~'), which implies that
,
(Rb —r) aon
=1R/71—1
0
(Rb—r) ^ x(m).
(21.35)
Using this result in conjunction with lemma 21.3 we can deduce that
R(XX) !R1]-!
1
tr(y)= (Rb —r) a
(Rb—r) ~ m z'(m)
(21.36)
under Ho, and thus the F-test is robust with respect to the non-normality
assumption (21) above. Although the asymptotic distribution of t,(y) is chisquare, in practice the F-distribution provides a better approximation fora
small T (see Section 19.5)). This is particularly true when D(f’x,,o) has
heavy tails. The significance t-test being a special case of the F-test,
b.
(y= ae
Viwwn
~ N(O, 1)
9,9
under H,:
oP
8,=0
21.37
ere)
is also asymptotically justifiable and robust relative to the non-normality
assumption (21) above.
Because of lemma 21.2 intuition suggests that the testing results in
relation to c? will not be robust relative to the non-normality assumption.
Given that the asymptotic distribution of §? depends on „ or #¿= /u„/ø' the
kurtosis coefficient, any departures from normality (where «,=3) will
seriously affect the results based on the normality assumption. In particular
the size « and power of these tests can be very different from the ones based
on the postulated value of «. This can seriously affect all tests which depend
on the distribution of s* such as some heteroskedasticity and structural
change tests (see Sections 21.4-21.6 below). In order to get non-normality
robust tests in such cases we need to modify them to take account of ji,.
(2)
Testing for departures from normality
Tests for normality can be divided into parametric and non-parametric
tests depending on whether the alternative is given a parametric form or
not.
452
Departures from assumptions — probability model
(a)
Non-parametric tests
The Kolmogorov—Smirnov test
Based on the assumption that {u,/X,,t¢ 7} isan IID process we can use the
results of Appendix
1i.1 to construct test with rejection region
C,={y:./T Dt>c,}
(21.38)
where D* refers to the Kolmogorov-Smirnov test statistic in terms of the
residuals. Typical values of c, are:
a
€
O01
123
„
05
1.36
O01
1.67)
(21.39)
For a most illuminating discussion of this and similar tests see Durbin
(1973).
The Shapiro—Wilk test
This test is based on the ratio of two different estimators of the variance ơ?.
n
z=|
>
t=
2
1
Aer (Urey 1)
where ti1)
n=
T
T
| ), ap
(21.40)
it =1
are the ordered residuals,
if T iseven
or
T-1
n=——
if T is odd,
and a,; is a weight coefficient tabulated by Shapiro and Wilk (1965) for
sample sizes 2< 7< 50. The rejection region takes the form:
Cry=ty: W
(21.41)
where c, are tabulated in the above paper.
(b)
Parametric tests
The skewness—kurtosis test
The most widely used parametric test for normality is the skewness—
kurtosis. The parametric alternative in this test comes in the form of the
Pearson family of densities.
The Pearson family of distributions is based on the differential equation
din f(z)
(z—a)
——~=—————_—
đz
cạ+€¡z+e;z?`
21.42
)
21.2
Normality
453
where solution for different values of (a, co, Cc}, C2) generates a large number
of interesting distributions such as the gamma, beta and Student’s r. It can
be shown that knowledge of a”, «3; and a, can be used to determine the
distribution of Z within the Pearson family. In particular:
(21.43)
a=C, =(44+3)(a3)*0,
Co = (404 —3a3)07/d,
cạ=(2¿—3w¿—6)/4,
(21.44)
(21.45)
d=(10a,— 1203-18)
(see Kendall and Stuart (1969)). These parameters can be easily estimated
using G,@ 3 and %, and then used to give us some idea about the nature of the
departure from non-normality. Such information will be of considerable
interest in tackling non-normality (see subsection (3)). In the case of
normality c, =c,=0 => a;=0,a,=3. Departures from normality within the
Pearson family of particular interest are the following cases:
(a)
c,=0,c, #0. This gives rise to gamma-type distributions with the
chi-square an important member of this class of distributions. For
3
TNHÌÌ
Z~zư),
?
1
ay=3+
12
m>1.
(21.46)
(b)
c,=0, cg>0, c,>0. An important member of this class of
distributions is the Student's t. For Z~t(m), a,=0, #,=3+
{c)
c,
directly related to the chi-square and F-distributions. In particular
6/(m—4), (m>4).
if Z;~ x7(m,), i= 1,2, and Z,,Z, are independent, then
Z“(z2z;)~*(5:
3)
2:
_
m,
~Bl—_
0140
M)
2
1.47
where B(m,/2, m,/2) denotes the beta distribution with parameters
m,/2 and m;/2.
As argued above normality within the Pearson family is characterised by
sy=(w;/ø3)=0
and
a4=(p4/o*)=3.
(2148)
It is interesting to note that (48) also characterises normality within the
‘short’ (first four moments) Gram—Charlier expansion:
g(z)=[1 —4a3(23 —3z) + (a4 — 3)(z* — 6z? + 3)]đ®(z)
(see Section 10.6).
Bera and Jarque
(1982) using
the Pearson
family
(21.49)
as the parametric
alternative derived the following skewness—kurtosis test as a Lagrange
454
Departures from assumptions — probability model
multiplier test:
tine] ater aa? | 20
(21.50)
a-|(7
(21.51)
where
y ?)(š š i) |
«l0 š5)0 53]
TS
os
TX,
The rejection region is defined by
Cy =ty: THY) > Cy},
(21.53)
|, dy7(2) =a.
fa
A less formal derivation of the test can
distributions of &3 and &,:
be based
on
the asymptotic
Ho
/T a; ~ N(0,6)
(21.54)
v/T(á,— 3) ~ N(0,24).
(21.55)
a
With đ; and đ¿ being asymptotically independent (see Kendall and Stuart
(1969)) we can add the squares of their standardised forms to derive (50); see
Section 6.3.
Let us consider the skewness—kurtosis test for the money equation
m, = 2.896 +0.690y, + 0.865p, —0.055i, + ñ,,
(1.034) (0.105) (0.020) (0.013) (0.039)
R?=0.995, R?=0.995, s=0.0393, log L=1474,
T=80, &2=0.005, (&@,—3)?=0.145.
(21.56)
Thus, t%(y) =0.55 and since c,=5.99 for «=0.5 we can deduce that under
the assumption that the other assumptions underlying the linear regression
model are valid the null hypothesis Hy: #,=0 and a, =3 is not rejected for
œ=0.05.
There are several things to note about the above skewness—kurtosis test.
Firstly, it is an asymptotic test and caution should be exercised when the
sample size T is small: For higher-order approximations of the finite sample
distribution of &, and a, see Pearson, D’Agostino and Bowman (1977),
Bowman and Shenton (1975), inter alia. Secondly, the test is sensitive to
‘outliers’ (‘unusually large’ deviations). This can be both a blessing and a
21.2
Normality
455
hindrance. The first reaction of a practitioner whose residuals fail this
normality test is to look for such outliers. When the apparent nonnormality can be explained by the presence of these outliers the problem
can be solved when the presence of the outliers can itself be explained.
Otherwise,
alternative
forms
of tackling
non-normality
need
to
be
considered as discussed below. Thirdly, in the case where the standard error
of the regression ¢ is relatively large (because very little of the variation in y,
is actually explained), it can dominate the test statistic r#(y). It will be
suggested in Chapter 23 that the acceptance of normality in the case of the
money equation above is largely due to this. Fourthly, rejection of
normality using the skewness—kurtosis test gives us no information as to the
jnature of the departures from normality unless it is due to the presence of
outliers.
A natural way to extend the skewness-kurtosis test is to include
cumulants of order higher than four which are zero under normality (see
Appendix 6.1).
(3)
Tackling non-normality
When the normality assumption is invalid there are two possible ways to
proceed. One is to postulate a more appropriate distribution for D( y,/X,; 9)
and respecify the linear regression model accordingly. This option is rarely
considered, however, because most of the results in this context are
developed under the normality assumption. For this reason the second way
to proceed,
based
on
normalising
transformations,
is by far the most
commonly used way to tackle non-normality. This approach amounts to
applying
Because
a transformation to y, or/and X, so as to induce normality.
of the
relationship
between
normality,
linearity
and
homoskedasticity these transformations commonly induce linearity and
homoskedasticity as well.
One of the most interesting family of transformations in this context is
the Box—Cox (1964) transformation. For an arbitrary random variable Z
the Box—Cox transformation takes the form
z=!
zt ==
O
(21.57)
Of particular interest are the three cases:
(i)
6=-1,
Z*=Z~!
(ii)
6=0.5,
Z*=(Z)? — square root;
— reciprocal;
(21.58)
(21.59)
456
(iii)
Departures from assumptions — probability model
6=0,
Z*=log, Z — logarithmic
(21.60)
(note: lim Z* = log, Z).
670
The first two
cases are not commonly
used in econometric
modelling
because of the difficulties involved in interpreting Z* in the context of an
empirical
econometric
model.
Often,
transformation might be convenient as
transformation.
This
is because
certain
however,
the
square-root
a homoskedasticity inducing
economic
time-series
exhibit
variances which change with its trending mean (m,), i.e. Var(Z,) = m,o?, t= 1.
2,...,
T. In such cases the square-root
transformation
can be used as a
variance-stabilising one (see Appendix 21.1) since Var(Z*)~o?.
The logarithmic
trari8formation
is of considerable
interest in
econometric modelling for a variety of reasons. Firstly, for a random
variable Z, whose distribution is closer to the log normal, gamma or chisquare (i.e. positively skewed), the distribution of log, Z, is approximately
normal (see Johnson and Kotz (1970)). The log, transformation induces
‘near symmetry’ to the original skewed distribution and allows Z* to take
negative values even though Z could not. For economic data which take
only positive values this can be a useful transformation to achieve near
normality. Secondly, the log, transformation can be used as a variance-
stabilising transformation in the case where the heteroskedasticity takes the
form
Var(y,/X,=x,)=o2=(u,)'02,
For y*=log,y,
Var(y*/X,=x,)=ø?,
t=1,2....,T,
(21.61)
t=1, 2, ...,
T. Thirdly, the log
transformation can be used to define useful economic concepts such as
elasticities and growth rates. For example, in the case of the money
equation considered above the variables are all in logarithmic form and the
estimated coefficients can be interpreted as elasticities (assuming that the
estimated equation constitutes a well-defined statistical model; a doubtful
assumption). Moreover, the growth rate of Z, defined by Z* =(Z,—Z, _ U/
Z,-, can be approximated
Alog.Z,~log(1+Z,)~Z,".
by
Alog,Z,=log,Z,—log Z,_,
because
In practice the Box—Cox transformation can be used with 5 unspecified
and let the data determine its value (see Zarembka (1974)). For the money
equation the original variables M,, Y,, P, and I, were used in the Box-Cox
transformed equation:
(
M?-1
3
)5,
+8(
Y?-1
Pe—1
(2
(
lo
:
}*u (21.62)
21.3
Linearity
457
and allowed the data to determine the value of 6. The estimated 6 value
chosen
was 6=0.530
B, =0.252,
(0.223)
and
;=0.865,
(0.119)
;=0.005, ổ¿= —0.00007.
(0.0001)
(0.000 02)
that the original logarithmic transformation is
‘Does this mean
inappropriate” The answer is, not necessarily. This is because the estimated
value of 5 depends on the estimated equation being a well-defined statistical
GM (no misspecification). In the money equation example there is enough
evidence to suggest that various forms of misspecification are indeed
present (see also Sections 21.3-7 and Chapter 22).
The alternative way to tackle non-linearity by postulating a more
appropriate form for the distribution of Z, remains largely unexplored.
Most of the results in this direction are limited to multivariate distributions
closely related to the normal such as the elliptical family of distributions (see
Section 21.3 below). On the question of robust estimation see Amemiya
(1985).
21.3
Linearity
As argued above, the assumption
(21.63)
E(y,/X,=X,) = BX,,
where B=L3,'6,, can be viewed as a consequence of the assumption that
Z,~ N(0,Z), te T (Z, is a normal IID sequence of r.v.’s). The form of (63) 15
not as restrictive as it seems at first sight because E(y,/X* = x**) can be non-
linear in x* but linear in x,=/(x*) where [(-) is a well-behaved
transformation such as x,=log x* and x,= Oc)? Moreover, terms such as
Cot cyt tent? +7 +Â,Â"
and
h
2n
ist
Cj
_
(2n
Cot Ơ { a, cos{ — }t+y; sin +
i
Ít}
(21.64)
purporting to model a time trend and seasonal effects respectively, can be
easily accommodated as part of the constant. This can be justified in the
context of the above analysis by extending Z,~ N(0,2), te 17, to Z,~
Nim,, Z),t € T, being an independent sequence of random vectors where the
mean isa function of time and the covariance matrix is the same forall te T.
The sequence of random vectors {Z,,t¢€ 1} in this case constitutes a nonstationary sequence (see Section 21.5 below). The non-linearities of interest
in this section are the ones which cannot be accommodated into a linear
conditional mean after transformation.
458
Departures from assumptions — probability model
It is important to note that postulating (63), without assuming normality
of D(y,,X,; w), we limit the class of symmetric
distributions in which
D(y,,X%,; w) could belong to that of elliptical distributions, denoted by
EL(u, 2) (see Kelker (1970)). These distributions provide an extension of the
multivariate normal distribution which preserve its bell-like shape and
symmetry. Assuming that
(x) (ole £2)
ass
E(y,/X,=X,) =o, 227 X;
(21.66)
Vy
O\(o1,
đa
implies that
and
Var(y,/X,=X,)=g(X,)(Ø¡
This shows that the assumption
— Øi 227).
(21.67)
of linearity is not as sensitive to some
departures from normality as the homoskedasticity assumption. Indeed,
homoskedasticity of the conditional variance characterises the normal
distribution within the class of elliptical distributions (see Chmielewski
(198 1)).
(1)
Implications of non-linearity
Let us consider the implications of non-linearity for the results of Chapter
19 related to the estimation, testing and prediction in the context of the
linear regression model. In particular, ‘what
assuming that D(Z,; w) is not normal and
are the implications
E(y,/X,= X,) =h(X,),
(21.68)
where h(x,)4 B’x,””
In Chapter
defined to be
19 the statistical GM
for the linear regression model
Vp = BX, +4,
however, is
and
EF(u*?/X,=x,)=øơ”.
The
with E(u,/X,=x,)=0,
‘true’ statistical GM,
(21.70)
¥, =Alx,) + &,
where
and
white
y,=E(y,/X,=x,) =h(x,)
(70) we
noise
can
but
was
(21.69)
thinking that uf = E(y,/X,=x,) = Ÿx, and u* = y,— u*
E(uxu*/X,=x,)=0
of
see that
and
¢,=y,—E(y,/X,=x,).
the error
term
Comparing
in the former
u,=y,—f’x,=h(x,)—B’x,+¢,=g(x,)
+e,
is no
(69)
longer
Moreover,
21.3
Linearity
459
E(ur/X,=x,)=9(X,), E(u#u#) #0 and
E(u? /X,=x,)=9(X,)" + 07.
(21.71)
In view of these properties of u, we can deduce that for
e=(g(X¡),ø(X¿)..... Ø(Xr)),
and
(21.72)
(21.72)
E(B) = B+(X’X) 'Xez8,
, M,
2
2
T0
E(s“)=ø“+e
2
,
,
M.=I-X(XX)
—1XVÃ:
'X,
(21.73)
because y=Xạ+e+e not y=X+u.
Moreover, 8 and s2 are also
inconsistent estimators of ổ and øŸ? unless the approximation error e
satisfies (1/T)X’e > 0 and (1/T)e’M,e > 0 as T— & respectively. That is,
unless h(x,) is not ‘too’ non-linear and the non-linearity decreases with T, B
and s* are inconsistent estimators of B and o?.
As we can see, the consequences of non-linearity are quite serious as far as
the properties of B and s* are concerned, being biased and inconsistent
estimators of B and o’, in general. What is more, the testing and prediction
results derived in Chapter 19 are generally invalid in the case of nonlinearity. In view of this the question arises as to what is it we are estimating
by s? and B in (70)?
Given that u,=(h(x,)
— B’x,) +6, we can think of
as an estimator of p*
where B* is the parameter which minimises the mean square error of u,, i.e.
B* =min o*(B)
8
where o7(B) =E(u7).
(21.74)
This is because [¢o7(B)]/CB =(—2)E[(h(x,) — B’x,)x/] =0 (assuming that we
can
differentiate
inside
the expectation
operator).
Hence,
p*=
E(x,x,) | E(A(x,)x))
= 23,'¢>,, say. Moreover, s? can be viewed as the
natural estimator of o?(p*). That is, B and s? are the natural estimators
of a least-squares approximation B*’x, to the unknown function h(x,)
and the least-squares approximation-error respectively. What is more,
we can show that p— Br and s* ->ø2(§*) (see White (1980)).
(2)
Testing for non-linearity
In view of the serious implications of non-linearity for the results of Chapter
19 it is important to be able to test for departures from the linearity
assumption. In particular we need to construct tests for
Ho: Ely,/X,=X,)= Bx,
(21.75)
460
against
Departures from assumptions — probability model
Hy: E(y,/X,=X,) =h(x,).
(21.76)
This, however, raises the question of postulating a particular functional
form for h(x,) which is not available unless we are prepared to assume a
particular
form
for
D(Z,;~).
Alternatively,
we
could
use
the
parametrisation related to the Kolmogorov—-Gabor and systematic
component polynomials introduced in Section 21.2.
Using, say, a third-order Kolmogorov-Gabor polynomial (KG(3)) we
can postulate the alternative statistical GM:
1= oXi +22, 3š, +,
(21.77)
where ¿;, includes the second-order terms
XX,
ESj,
LJ=2,3,...,k,
(21.78)
and w;, the third-order terms
xuXyXụ,
Í>j>I,
1j1=2/3,....k.
(2179)
Note that x,, is assumed to be the constant.
Assuming that T is large enough to enable us to estimate (77) we can test
linearity in the form of:
Ho: y2=0
and
y3;=0,
A,:y,40
or
y;,40
using the usual F-type test (see Section 21.1). An asymptotically equivalent
test can be based on the R? of the auxiliary regression:
ti, =(Bo —BYX,+ yoo, + sat
8
(21.80)
using the Lagrange multiplier test statistic
RRSS—URSS`e
ass)
~ 2a)
LM(y)= TR= r(
a
(21.81)
q being the number of restrictions (see Engle (1984)). Its rejection region is
Cy=ty: LM(y)>c,},
|
dy7(q) =a.
For small T the F-type test is preferable in practice because of the degrees of
freedom
adjustment;
see Section
19.5.
Using the polynomial in y, we can postulate the alternative GM
form:
+ v;
+ Cote +C3up +00 Hull
Ve=ByX,
of the
(21.82)
where ju,= B’x,. A direct comparison between (75) and (82) gives rise to a
21.3
Linearity
461
RESET type test (see Ramsey (1974)) for linearity based on Hg: cy=c3=
---=¢,,=0,H,:c;40,i=2,...,m. Again this can be tested using the F-type
test or the LM test both based on the auxiliary regression:
u, =(B, — By x, + » Ciậu +0,,
¡=2
=X,.
(21.83)
Let us apply these tests to the money equation estimated in Section 19.4.
The F-test based on (77) with terms up to third order (but excluding
because of collinearity with y,) yielded:
FT(y)
_0 .117520-0.045
17 520 -0.045 477 ( (67
0.045 477
9
=
Given that c,=2.02 the null hypothesis of linearity is strongly rejected.
Similarly, the RESET type test based on (82) with m=4
because of collinearity with 4i,) yielded:
FT(y)
.117520—0.060 28
Me
0.060 28
(74
25
(excluding i?
)=3513
Again, with c,=3.12 linearity is strongly rejected.
It is important to note that although the RESET type test is based on a
more restrictive form of the alternative (compare (77) with (82)) it might be
the only test available in the case where the degrees of freedom are at a
premium (see Chapter 23).
(3)
Tackling non-linearity
As argued in Section 21.1 the results of the various misspecification tests
should be considered simultaneously because the assumptions are closely
interrelated. For example in the case of the estimated money equation it is
highly likely that the hnearity assumption was rejected because the
independent sample assumption [8] is invalid. In cases, however, where the
source of the departure is indeed the normality assumption (leading to nonlinearity) we need to consider the question of how to proceed by relaxing the
normality of {Z,_,,t¢€1}. One way to proceed from this is to postulate a
general distribution D(y,,X,;w) and derive the specific form of the
conditional expectation
(21.84)
E(y,/X,= X,) = A(x,).
Choosing
the form
of D(y,, X,: ¥) will determine
both
the form
of the
conditional expectation as well as the conditional variance (see Chapter 7).
An alternative way to proceed is to use some normalising transformation
462
Departures from assumptions — probability model
on the original variables y, and X, so as to ensure that the transformed
variables y* and X* are indeed jointly normal and hence
E(
yi /XF =x?) = B*'x?
and
Var(y*/X*=x#*)= ở.
(21.85)
(21.86)
The transformations considered in Section 21.2 in relation to normality are
also directly related to the problem of non-linearity. The Box—Cox
transformation can be used with different values of 6 for each random
variable involved to linearise highly non-linear functional forms. In such a
case the transformed r.v.’s take the general form
o
xp=(=)
i=1,2,...,k
(21.87)
(see Box and Tidwell (1962)).
In practice non-linear regression models are used in conjunction with the
normality of the conditional distribution (see Judge et al. (1985), inter alia).
The question which naturally arises is, ‘how can we reconcile the non-
linearity of the conditional expectation and the normality of D(y,/X,; 0} As
mentioned in Section 19.2, the linearity of u,=E(y,/X,=x,) is a direct
consequence of the normality of the joint distribution D(y,, X,; w). One way
the non-linearity of E(y,/X,=x,) and the normality of D(y,/X,; 0) can be
reconciled is to argue that the conditional distribution is normal in the
transformed variables X*
= h(X,), ie. D(y,/X* =x) linear in x* but non-
linear in x,, Le.
E(y,/X,=X,)
= 9X, y)-
(21.88)
Moreover, the parameters of interest are not the linear regression
parameters 6=(B, 0”) but @=(y, o2). It must be emphasised that nonlinearity in the present context refers to both non-linearity in parameters (y)
and variables (X,).
Non-linear regression models based on the statistical GM:
},=0(X,.}) +,
(21.89)
can be estimated by least-squares based on the minimisation of
,
Sy = È_ Ú,—ø(X,))Ÿ.
(21.90)