Tải bản đầy đủ (.pdf) (414 trang)

Ebook Applied multivariate statistical analysis (5th edition) Part 2

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (15.6 MB, 414 trang )

CHAPTER

7

Multivariate Linear
Regression Models

7.1

I NTRODUCTI ON

Regr
e
s
i
o
n
anal
y
s
i
s
i
s
t
h
e
s
t
a
t


i
s
t
i
c
al
met
h
odol
o
gy
f
o
r
predi
c
t
i
n
g
val
u
es
of
one
morvariaeble values. (dItependent
)
var
i
a

bl
e
s
f
r
o
m
a
col
l
e
ct
i
o
n
of
(
i
n
dependent
)
ses ing the effectculs ofledthferpromedithcetotritvarle ofiablthees
onfirstthpaper
e respononsthees. scanuUnfbjealoctrsotbyubenatF.useGalleyd, ttfhooenr asname
[
1
4]
,
i
n

no
way
r
e
fl
e
ct
s
ei
t
h
er
t
h
e
i
m
por
t
a
nce
or breIadtn thhisofchaptappliecrat, weionfofirstthdiissmetcus htodolhe mulogy.tiple regres ion model for the predic­
tion of adependentrespvaronsiea.blThies. sOurmodeltreiats tmhenentgenermustabelizedsomewhat
to handletetrhsee,prasedia cvastiotnlit­
erseaettuhreefexiol sotwis onng tbooks
he subj, ienctas. c(endiIf young arorederintoferedistfeidculintpury: Bower
suingmreangreands ioOn 'analConnelysisl,
[and5], NetWeiesrb,erWasg [9s]e,rSeber
man, Kut[20]n, ander, andGolNacht
s

h
ei
m
[
1
7]
,
Dr
a
per
and
Smi
t
h
[
1
2]
,
Cook
d
ber
g
er
[
1
5]
.
)
Our
abbr

e
vi
a
t
e
d
t
r
e
at
m
ent
hi
g

lofightthsetrheegrreegrs ieosniomodel
n assu, mptandiothnse gener
and thaeil rapplconsicabiequences
,
al
t
e
r
n
at
i
v
e
f
o

r
m
ul
a
t
i
o
ns
lity of regres ion techniques
seemingly dif erent situations.
or

response

predictor

regression,

several

single

of

to

7.2

TH E CLASSICAL LIN EAR REGRESSION M O D E L


LetFor exampl, e,bewithpredic4,tower varmiiagblhtehaves thought to be related to a response variable
current market value of home
Y.

z 1 , z2 ,

• • •

Zr

r

r

==

Y

354

==


Section 7.2

and

The Classica l Linear Reg ress ion M odel

355


s
q
uar
e
f
e
et
of
l
i
v
i
n
g
ar
e
a
zz32 lapprocataioisned(ivalndiucate loarstfoyearr zone of city)
z
qual
i
t
y
of
cons
t
r
u
ct

i
o
n
p
r
i
c
e
per
s
q
uar
e
f
o
ot
)
(
4
Thein a ccontlas iicnauousl linearmanner
regres oniontmodel
s
t
a
t
e
s
t
h
at

i
s
compos
e
d
of
a
mean,
whi
c
h
depends
h
e
z/
s
,
and
a
r
a
ndom
er
r
o
r
whi
c
h
account

s
f
o
r
mea­
sfnotromexpltheicexper
itly consimentideroredsienttbyhe model
.
Thevessuretimentvalgatuoreserarofreorttrhandeeatpretdhedieaseffctoerctvars ofTheiaotblheerers rrvaroercor(iaadndbledehence
t
h
e
i
n
­
t
h
e
r
e
s
p
ons
e
)
i
s
vi
e
wed

as
a
r
a

dom varSpeciablifiecawhosl y, tehebehavi
o
r
i
s
char
a
ct
e
r
i
z
ed
by
a
s
e
t
of
di
s
t
r
i
b

ut
i
o
nal
as
s
u
mpt
i
o
ns
.
linear regres ion model with a single response takes the form
[
R
es
p
ons
e
]
[
m
ean
(
d
ependi
n
g
on
z

,
z
,
,
[
e
r
o
r]
1
2
Theparatmetermer"lsinear" refers to tThehe faprctetdihatctotrhevarmeaniableiss amaylinearor mayfunctnotionentof tehretunknown
h
e
model
as firsWit-otrhderintedependent
rms. observations on and the associated values of zi , the com­
plete model becomes
(7 - 1 )
where the error terms are assumed to have the fol owing properties:
(7-2)
Var a-2 (constant); and
In matrix notation, (7-11) becomes
y2 1 Z2
1
or
and the specificatandions in (7-2) become
Cov
a-21.
z1 =

=
=
=

Y

s,

fixed.

= f3o + f31Z1 +

Y

· ··

+ {3, z , + s

=

. . •

Zr ) ] +

{30 , {3 1 , . . . , f3r ·

Y

n


Y1 = f3o + /3 1 Z1 1 + f3 2 Z1 2 +

···

+ f3r Z1 r + 8 1

Yn = f3 o + /3 1Zn 1 + f3 2 Zn 2 +

···

+ f3rZn r + Bn

1. E ( sj ) = 0;
2.
( sj ) =
3. Cov ( sj , sk) = O , j # k.

Yi

Yn

Z1 1 Z1 2
Z2 1

Z1 r
Z2r

f3 o
{3 1


Zn 1 Zn 2

Znr

f3r

f3
Y = Z
+ e
( n X 1 ) ( n X ( r +1 )) ( ( r +1 ) X1 ) ( n X1 )

1. E ( e ) = 0;
2.
( e ) = E ( ee ' ) =

+

s1
s2

Bn


356

Chapter 7

M u ltiva r i ate Linear Reg ression Models

Note that a one in the first column of the design matrix Z is the multiplier of the

constant term {3 0 • It is customary to introduce the artificial variable Zj o = 1, so that

+ f3rZjr = f3oZjo + f31Zj1 + · · · + f3rZjr
Each column of Z consists of the n values of the corresponding predictor variable,

f3 o + f31Zj1 +

· ··

while the jth row of Z contains the values for all predictor variables on the jth trial.

Although the error-term assumptions in (7-2) are very modest, we shall later need to
add the assumption of joint normality for making confidence statements and testing
hypotheses.
We now provide some examples of the linear regression model.
Example 7 . 1

(Fitti ng a straight- l i ne reg ression model)

Determine the linear regression model for fitting a straight line

E ( Y) = {3 0 + f31 z 1

Mean response =
to the data
0

1

2


3

4

1

4

3

8

9

B efore the responses Y' = [Yi , }2 , . , YS J are observed, the errors e ' =
[ 8 1 , 82 , . . . , 85 ] are random, and we can write
.

.

y

= Z /3

1
1

Z1 1
Z2 1


1

Zs 1

+

E

where

Y=

y1
Y2
Ys

'

Z =

p=

[ ��] ,

81
82
e =
8s


The data for this model are contained in the observed response vector y and the
design matrix Z, where


Section 7.2

1
y ==

357

The Classica l Li near Reg ress ion Model

4
3
8

'

Z ==

9

0
1

1
1
1
1

1

2

3
4

Note that we can handle a quadratic expression for the mean response by
introducing the term {3 2 z2 , with z2 == z i . The linear regression model for the jth
trial in this latter case is
or


Example 7.2

{The desig n matrix fo r o ne-way ANOVA
as a reg ression model)

Determine the design matrix if the linear regression model is applied to the
one-way AN OVA situation in Example
We create so-called dummy variables to handle the three population
means: JL 1 == JL + T 1 , JL2 == JL + T 2 , and JL3 == JL + T 3 • We set

6.6.

if the observation is
from population 1
otherwise

z2 ==


{

1 if the observation is
from population
0 otherwise

2

if the observation is
from population 3
otherwise
and {3 0 == JL , {3 1 == T 1 , {3 2 == T2 , {33 == T 3 • Then
1j == f3 o + f3IZj l + f3 2 Zj 2 + f3 3 Zj3 +

Bj ,

j == 1 ,

2,

...' 8

where we arrange the observations from the three populations in sequence.
Thus, we obtain the observed response vector and design matrix

9

y


( 8X 1 )

==

6
2
2
9
0
3

1

1
1
1
1
z ==
1
( 8X 4 )
1

1
1
1
0
0
0
1 0
1 0


0
0
0
1
1
0
0
0

0
0
0
0
0
1
1
1

7.2,



The construction of dummy variables, as in Example
allows the whole of
analysis of variance to be treated within the multiple linear regression framework.


358


7.3

Chapter 7

M u ltiva r i ate Linear Reg ress ion Models

LEAST SQUARES ESTI MATI O N
One of the obj ectives of regression analysis is t o develop an equation that will allow
the investigator to predict the response for given values of the predictor variab le s.
Thus, it is necessary to "fit" the model in (7-3) to the observed corresponding to th e
known values 1 ,
That is, we must determine the values for the regression
coefficients f3 and the error variance consistent with the available data.
j - · · · - b, z 1
Let be trial values for
Consider the difference
+ · +
between the observed response and the value
+
that wo uld
be expected if were the "true" parameter vector. Typically, the differe nce s
- ..·
will not be zero, because the response fluctuates (in a
manner characterized by the error term assumptions) about its expected value. The
method of least squares selects so as to minimize the sum of the squares of the
differences:

yj

Zj 1 , . . . , Zj r·


b

yj - b0 - b1 zj 1

/3.

b

a-2

yj - b0 - b1 z 1
b0 b1 zj 1 . . brZjr

yj

r

- b,Zj r

b

n

S ( b ) == j�=1 ( yj - bo - b1 Zj 1
== (y - Zb)'(y - Zb)

- "·

- brZjr) 2


(7- 4)

The coefficients chosen by the least squares criterion are called least squqres esti­
mates of the regression parameters They will henceforth be denoted by f3 to em­
phasize their role as es !imates of f3 .
The coefficients f3 are consistentA with !he data in th � sense that they produce
the sum of who se
estimated (fitted) mean responses,
+ · +
+
squares of the differences from the observed is as small as possible. The deviations
A
A
A
j 1, . . , n
- "·
(7-5)

b

/3.

{30 {3 1 zj 1 . . f3 rZjr,
yj
Bj == Yj - f3o - /3 1 Zj 1 - f3 rZjr , == .
are called residuals. The vector of residuals e == y - zp contains the information
about the remaining unknown parameter a-2 • (See Result 7. .)
Result 7.1. Let Z have full rank r + 1
n. 1 The least squares estimate of f3

in (7-3) is given by
p == (Z' z ) -1 Z'y
1
Let y == zp == Hy denote the fitted values of y , where H == Z (Z'Z) - Z' is called
"hat" matrix. Then the residuals
A
A
1
e == y - y == [I - Z (Z'Z) - Z'] y == (I - H) y
satisfy Z' e == 0 and y' e == 0. Also, the
residual sum ofsquares == � (yj - �o - � 1 Zj 1 - . . . - � rZjr) 2 == e ' e
j =l
== y' [I - Z (Z'Z) -1 Z' J y == y'y - y'Z/3A

2,

<

2

n

1 If Z is not full rank, ( Z ' Z ) -1 is replaced by ( Z ' Z ) - , a
Exercise 7 6 )
.

.

generalized inverse


of Z ' Z. (See


Section 7 . 3

Least Squares Estimation

359

"
1
"
"
Let
Z'
y
as
as
s
e
rt
e
d.
Then
=
y
y
f3
=
(

Z
'
Z
)
1
1
[I - Z( Z ' Z )- Z' ] y. 'The-1matZ' 'ri=x [[II - Z(Z(ZZ''ZZ))--1ZZ'']] satisf(iseysmmetric); = y- Z/3 =
[[II -- ZZ((ZZ'ZZ))-1 Z' JJ [I - Z-(Z' Z)-1Z ' ]
== I[I--2Z(Z (ZZ''ZZ))--11ZZ''] (Z(idempot
Z' Z)-e1ntZ ')Z; ( Z ' Z )-1Z'
Z' [I - Z (Z' Z)-1Z' J = Z' - Z' = 1
Cons
e
quent
l
y
,
Z
'
e
=
Z'
(
y
y)
=
Z'
[
I
Z(

Z
'
Z
)
Z
'
]
y
=
O,
s
o
y'
e
=
p ' Z ' e1 =
1
1
Addi
t
i
o
nal
l
y
,
Z
'
Z
r

(
Z
'
Z
r
'
£
=
y'
[
I
-Z(
z
'
]
[
I
z
'
]
y
=
y'
[
I
-z(
z
'
z
r

z
'
]
y

= y' y - y' Zf3 . To verify the expr" esZion for f3 , we write
Zb
=
y
Z
y
Zb
=
y
Z
Z
(
f3
-b)
f3
f3
f3
so S(b) = (y - Zb)' (y - Zb)
= (y 2(-yZ/3)'- Zf3)'("y -ZZ/3)( f3"" - b)(/3" - b)' Z' Z ( /3" - b)
= (y - Z/3)'" (y - Z/3)" (/3 - b)' Z' Z ( /3" - b)
nbandt
h
e
ssiencce(�ndyis-thZP)'e squarZ e=d e'leZngt=h of ZTh( f3� f-irsb)t te. rBecaus
m in S(ebZ)doesnotdepend


has
f
u
l
r
a
nk,
Z
(
f3
-b)
i(fZf3' Z )-1b,Z'yso. Nottheemithnat(imZum'Z)s-u1mexiofstsssqinuarceZ'es Zis hasruniqauenkrand occurs (fIofrZ'bZ=isf3not=
oftrafdiulctrsaZnk,haviZ'nZag fu=l rafonkr srome a but then a' Z' Za = or Za = which con­
Res
u
l
t
s
h
ows
how
t
h
e
l
e
as
t
s

q
uar
e
s
es
t
i
m
at
e
s
p and the residuals e can be
obtained from the design matrix Z and responses y by simple matrix operations.
Proof.

£

"'

1.

2.

+

(7-6)

0.

3.


o.



"'

+

"'

"

"'

+

+

+

+

"'

0'.

#

0


+ 1
0

# 0,

<

n.

f0

0,

+ 1.)



7.1

Example 7.3

(Calcu lating the least squares esti mates, the residua ls,
and the residual sum of sq uares)

Calsquarculeastfeotrhae slterasaitgshtquar-linee model
estimates p , the residuals e, and the residual sum of
fit to the data
2
y


0

1

1

4

3

3

4

8

9


360

Chapter 7

M u ltiva r iate Linear Reg ress ion Models

We have

Z'


Z'Z

y
1

[�

1 1 1
1 2 3

Consequently,

p=

[1 � J [

4

!]

10
30

3

89

[ �: ]

=


(z' z r 1 z'y = [

and the fitted equation is

y

(Z'Z) - 1

Z'y

-2

] [��]

6
- .2
.

.

.1

- :� -:� ] [�� ] [�]
=

== 1 + 2z

The vector of fitted (predicted) values is


Y

== zp ==

1
1
1
1
1

0
1
2
3
4

1

e == y - y ==
"

so

"

4

[� ]
1
3


1
3

=

5
==
8
3

7

9

9

5

7
9

0
1
-2
1
0

The residual sum of squares is


e' e

== [ o

1 -2 1 oJ

0
1
-2
1
0

== o2 + 12 + ( -2) 2 + 12 + o2 == 6



Sum-of-Sq uares Deco m position
According to Result 7.1, y ' e
satisfies

== 0, so the total response sum of squares y'y ==

y'y == (y + y - y )'(y + y - y) == ( y + i )'( y + e)

==

y' y + e' e

II


� Yi

1= 1

(7 -7 )


Section 7 . 3

Least Squares Esti mation
==

Since the first column of Z is 1, the condition Z' f;

361

0 includes the requirement

n
n
n
0 == l'e == � ej == � yj - � yj , or y == y . Subtracting ny2 == n ( y ) 2 from both
j= l
j =l
j =l
sides o f the decomposition in (7-7), w e obtain the basic decomposition of the sum of
squares about the mean:
or

�n (yj


_

'
yy -

y-)2

==

==

ny2

y ' y - n( y ) 2

n
� ( .Yj - y-)2
j =l

( ��:� :r: ) ( )
j =l

s
u
about mean

==

:����


re

n

squares

+

e' e

�n s7

+

j =l

(

+



residu l ( error)
sum 0 squares

(7-8)

)


The preceding sum of squares decomposition suggests that the quality of the models
fit can be measured by the coefficient of determination
n

" e"'2j
j =l
£..J

R2 == 1 - n
2
� (yj - y )
j =l

-----

(7-9)

n
� ( yj - y) 2
j =l

The quantity R 2 gives the proportion o f the total variation in the Y/S "explained" by,
or attributable to, the predictor variables z 1 , z2 , , Zr . Here R 2 (or the multiple
correlation coefficient R == + Vfii) equals 1 if the fitted equation passes through all
tp e da!a points, s � that ej == 0 for all j. At the other extreme, R 2 is 0 if
== y and
{3 1 == {3 2 == · · · == f3r == 0. In this case, the predictor variables z 1 , z2 , , Zr have no in­
fluence on the response.
• • •


ffi o

• . .

Geometry of Least Sq uares
A geometrical interpretation of the least squares technique highlights the nature of
the concept. According to the classical linear regression model,

Mean response vector =

E(Y) =

zp = f3 o

1
1
1

+

Z1 1
z
/3 1 2 1
Zn l

+

. . .

+


Z1 r
/3 , Z2r
Znr

Thus, E ( Y ) is a linear combination of the columns of Z. As f3 varies, Z/3 spans the
model plane of all linear combinations. Usually, the observation vector y will not lie
in the model plane, because of the random error e; that is, y is not (exactly) a linear
combination of the columns of Z. Recall that

(

y

response
vector

)

( )
Z/3

vector
in model
plane

+

( )
error

vector


362

Chapter 7

M u ltiva r i ate Linear Reg ress ion Models

3

e =y -y

Least sq u a res as a
proj ection for n = 3, r = 1 .

Figure 7.1

Once the observations become available, the least squares solution is derived
from the deviation vector
y -

Zb = ( observation vector ) - (vector in model plane )

The squared length ( y Zb) ' ( y - Zb) is the sum of squares S ( b ) . As illustrated in
Figure
S(b) is as small as possible when b is selected such that Zb is the point in
the model plane closest to y. This point occurs at th � tip of th � perpendicular pro­
jection of y on the plane. That is, for the choice b = f3 , y =
is the projection of

y on the plane consisting of all linear combinations of the columns of
The resid­
ual vector e = y - y is perpendicular to that plane. This geometry holds even when
is not of full rank.
When has full rank, the proj ection operation is expressed analytically as
multiplication by the matrix
To see this, we use the spectral decompo­
sition
to write
-

7.1,

Z/3

Z

Z

Z ( Z' Z) -1 Z'.

(2-16)

A 1 A2

Ar + 1

where
···
> 0 are the eigenvalues of

the corresponding eigenvectors. If is of full rank,
>

Z.

>

>

Z

Z' Z and e 1 , e 2 , . . . , e r + l are

(Z'Z) - 1 = -A11 e 1e 1 + -A1 e2 e2 + · · · + Ar1+ 1 e r + 1 e�+ 1
2
1
Consider q i Ai 12 Ze i , which is a linear combination of the columns of Z. Then q;
A-:-1/2 A-k1/2 e�Z' Ze k = A-:- 1/2 A-k112 e�Ak ek = 0 if i # k or 1 if i = k That is the
vectors qi are mutually perpendicular and have unit length. Their linear combinations
span the space of all linear combinations of the columns of Z. Moreover,
r+ 1 1
r+1
1
Z (Z'Z) Z' = i�= 1 Aj Zei ejZ' �
i =1 qi q;
--

==

l


==

l

l

l



'

r +

1

Qk


Section 7 . 3

Least Sq uares Esti mation

(
)
=�

363


According to Result 2A.2 and Definition 2A.12, the proj ection of y on a linear com-

q2, -�1 (
2Z. [I - Z (ZZ(' ZZ)' Z)-1 Z' Z'
r+l

. . . , q ,+l } is

bination of { q l ,

r+l

q;y) q;

q;q; y

= z (Z' Zf1 Z'y = z{3.

Thus, multiplication by
projects a vector onto the space spanned by the
columns of
Similarly,
J is the matrix for the proj ection of y on the plane
perpendicular to the plane spanned by the columns of

Z.

Sam p l i ng Properties of Classical Least Squares Esti mators
The least squares estimator
tailed in the next result.


e

/3 and the residuals

have the sampling properties de­

Result 7.2. Under the general linear regression model in (7-3), the least squares

p... = ( Z ' Z )-1Z'Y
E( P ) =
P = a-2(Z' Z)-1
e
E(e) = 2 (e) = a-2[I - Z (Z' Z)-1Z' ] = a-2[I - H]
Also, E(e'e) = ..... , ..... 1)a- ,
2s = e e+ Y'[I - Z(- Z '-Z )-1Z' ] Y Y'[I - H]Y
E(s2) = a-2
p e
Y = zp + e
p = ( Z ' Z ) - 1 Z ' Y = ( Z ' Z ) - 1 Z ' ( Z p + e) = p + ( Z ' Z ) - 1 Z ' e
e == [[II -- Z(Z(ZZ''ZZ))--11ZZ'' ]] Y[Zp + e] = [I - Z( Z ' Z )-1Z' ] e

estimator

has

P

The residuals


and

Cov ( )

have the properties

0 and Cov

(n -

n

-

so defining

r -

(r

n

1)

r

n-r-1

1


we have

Moreover,

Now,

Proof.

and

are uncorrelated.

Before the response

.

2 If Z is not of full rank, we can use the

is observed, it is a random vector.

generalized inverse

(7-10)

r1 + 1
(Z'Z)- =

_2:

i=l


A;1 e1e; , where
r1 + 1
. = A, + , as described in Exercise 7.6. Then Z ( Z' Z)-Z'
_2: q 1 q;
1

A 1 ;;::: A2 ;;::: . . . ;;::: A,. 1 + 1 > 0 A,. 1 + 2
i=l
has rank r 1 + 1 and generates the unique projection o f y on the space spanned by the linearly independent columns of Z. This is true for any choice of the generalized inverse. (See [20] .)
=

=

=


364

Chapter 7

M u ltivariate Linear Reg ression Models

since [I - ZE((PZ)' Z=)-1f3Z' ] Z( Z=' ZZ)--1ZZ'E(=e) =Frf3om (2-24) and
Cov (p) == a-( Z'2(z'Z)z)-1Z-1' Cov (e) Z ( Z' Z)-1 = a-2(Z' Z)-1Z' Z ( Z' Z)-1
ECov((ee)) == [[II -- Z(Z( ZZ''ZZ))--11ZZ'' ]] E(Cov(e) e=)[I - Z( Z'Z)-1Z' J '
2[I - Z( Z'Z)-1Z ' ]
=
awhere tCov(he laspt ,e)equal=itE[y f(opl o-wsf3)e'from] = ( Z 'AlZ )s-o1,Z' E( ee' ) [I - Z(Z'Z)-1Z ' ]
2(-Z1 'Z)-1Z' [I - Z( Z ' Z )-1Z' J =

=
abecause Z' [I - Z(e'Ze'Z=) e'Z[I' ]-=Z( ZFr' Zo)m-1Z' ] [I - Z(Zand'Z)-Res1Z'u]let
== te'r [[e'I -(I Z(- ZZ '(ZZ')-Z)1 Z-'1JZ'e ) eJ
1 Z' J ee' )
=
t
r
([
I
Z(
Z
'
Z
)
Now, for anE(artbr i(tWrar))y n= E(nWra1 ndomW2matrix W,Wnn)
)
=
t
r
[
)]
=
E
(
)
)
E(
E(
E(
W

W
W
W
nn
1
2
Thus, using Result E(e'wee) =obttrai([nI - Z(Z ' Z )-1 Z' ]E(ee' ) )
== a-a-22 ttrr [(II)--Z(a-2Ztr' Z[Z)-(1Z'Z'Z)] -1Z' J
== a-na-2n2 -- a-a-22 ttrr [[ ( Z' Z)I -1 Z' Z]
]
1
1
2
n
=
(
)
aand the result for s2 = e' ej( n - fol ows.
The
l
e
as
t
s
q
uar
e
s
es
t

i
m
at
o
r
pos
s
e
s
e
s
a
mi
n
i
m
um
var
i
a
nce
pr
o
per
t
y
t
h
at
was

f3
fparirstametestarblicisfhuednctbyionsGausof tsh.eTheformfol of3wi=ng result concerns "best" esftoimr anyators of linear
Let
=
Z
,
wher
e
/3
e
2
E( e) = Cov (e) = a- I, andf3 =Z has ful rank For any the estimator
0.

(2-45),

+

0

(7-6).

0

0.

(7-10), (7-6),

4.9,


X

+ ··· +
+ ··· +

+
+

2A . 12,

(r + ) X (r + )

1

r



r - 1)
"

c0{30 + c 1 {3 1 + · · · + cr f3 r

c'

Result 7.3 (Gauss' 3 least squares theorem).
0,
r + 1.
c'
Co f3 o + C1 {3 1 + . . . + Cr f3 r

"

"

"

Y
c,

c.

+

"

3 Much later, Markov proved a less general result, which misled many writers into attaching his
name to this theorem.


Section 7.4

I nfe rences About the Reg ression M odel

365

of c' f3 has the smallest posa'sYibl=e varianceaamong
al
l
l
i

n
ear
es
t
i
m
at
o
r
s
of
t
h
e
f
o
r
m
Y
2
2
that are unbiased for c' f3.
.
Then
For
any
f
i
x
ed

c
,
l
e
t
a'
Y
be
any
unbi
a
s
e
d
es
t
i
m
at
o
r
of
c
'
f3
Also, edbyvalasuseumptexprioen,s ioEns(ayi'Ye)ld=s
the valinguetheoftwf3o. expect
EE((aa''YZ/3) = c'a'/3e), =whata' Z/3.ever Equat
a'ThiZs/3im=plc'ief3s thor"at(c'c' =- a'a'ZZ )fo/31 r =any unbifor alasle/3,d esintclimuatdionrg. the choi-1ce f3 = ( c' - a' Z) '.
Now,

c'
=
Z
(
Z
'
Z
)
c
.
Mor
e
over
,
a*
f3
=
f
r
om
c'
(
Z
'
Z
wi
t
h
)
Z

'
Y
a*'
Y
=
Resa satuilstfy7.in2gE(thPe )unbi= f3as,sedo rc'equiP =rement
a*' Y ics' anunbia' Z,asedestimatorofc' f3 . Thus, forany
Var(a'Y) == a-Var(2(a a-' Za*/3 a*)a'e')( a=-Var(a* a'ea*)) = a' Io2a
2 ' (a - a*)-'1( a - a*) a*' a*]
=
asa'inZce- (a*'a -Za*)= 'ca' *-=c'(a=- a*)Becaus
Z ( Z ' Ze )a* cis fixedfrandom (thae-condia*) ' (taio-n a*)(1a -is a*)pos'iZtiv=e
unles a= a*, Var(a'Y) is minimized by the choice a*' Y = c' (Z' Zf Z'Y = c' {3 .
Thi
s
power
f
u
l
l
e
ads
t
o
t
h
e
bes
t
f

o
r
r
e
s
u
l
t
f3
t
i
o
n
of
s
t
a
t
e
s
t
h
at
s
u
bs
t
i
t
u

of c' f3 for any c of interest. In statistical terminolog(y,BtLUE)
he estiofmatc' o/3.r c' f3 is calestliemdatthoer
We(7-3des) wictrhibteheinaddiferetntioinalal pr(toecedur
e
s
bas
e
d
on
t
h
e
cl
a
s
i
c
al
l
i
n
ear
r
e
gr
e
s
i
o
n

model
i
n
nt
a
t
i
v
e)
as
s
u
mpt
i
o
n
t
h
at
t
h
e
er
r
o
r
s
e
have
a

nor
m
al
di
s
­
itnribSectutioion.n 7.Met6. hods for checking the general adequacy of the model are considered
Before we can asses the imE(porY)ta=nce of par{3 ticular variables in the (7
1
we musTotdodetseor,mweinsehtalhel assasmplumeintghatdistthrieberutrioonsrs eofhave{3 anda northemrealsididualstribsuutmioofn. squares,
and
r
a
nk
e
i
s
di
s
t
r
i
b
ut
e
d
f
u
l
Let

Y
has
=
Z/3
,
wher
e
e
2
assquares eso-ti1m) .atTheor /3.il thMore maxieovermum, likelihood estimator of f3 is the same as the least
P = (Z' Zf1Z'Y is distributed as cr2(Z' Zf1)
+ · · · + an Yn

a 1 Y1 +

Proof

+

0

=

+
+

+

+


[

=

0' .

0



fJ
"

"

best (minimum-variance) linear unbiased estima tor

7.4

I N FERENCES ABOUT TH E REGRESSION M O D E L

I nferences Concerning the Reg ressi o n Parameters

f3 o +

regression function
-1 1 )

Z 1 + · · · + f3r Zr


e' e .

Result 7.4.
Nn (O,

+

Z

r+1

N, + l ( /3,


366

Chapter 7

M u ltiva riate Linear Reg ress ion Models

and is distributed independent= ly ofisthdiesrterisbidutualedsas =o-2X-� rzp1 . Further,
2
2
where (J- is the maximum likelihood estimator of o- •
Gi
v
en
t
h
e

dat
a
and
t
h
e
nor
m
al
as
s
u
mpt
i
o
n
f
o
r
t
h
e
er
r
o
r
s
,
t
h

e
l
i
k
el
i
­
2
hood function for o-2 is n 2/2a2
L ( lT ) = rr
j=1 lTe-e-El = (2/1T2a)2nf2lTn e
n
2
n
f
1T
(
2
)
lT
ForButathfiisxedminvalimuiezato-2io, tnhyie leikldelsithhoode leasis tmaxisquaresmizedestbyimmiatenimi=zin(gZ'(yZ)--1Z/3)Z'y,'whi(y -chZ/3does).
2
nothooddepend
upon

Ther
e
f
o
r

e
,
under
t
h
e
nor
m
al
as
s
u
mpt
i
o
n,
!he
maxi
m
um
l
i
k
el
i
­
oand
l
e
as

t
s
q
uar
e
s
appr
o
aches
pr
o
vi
d
e
t
h
e
s
a
me
es
t
i
m
at
o
r
Next
,
maxi

m
i
z
i
n
g
2
2
L(p, o- ) over o- [see (4- 1 8) ] gives
L ({3 , 02 ) = ( 21T ) nf2 ( (J-2 ) n/2 e-n/2 where cl-2 = (y - z{3 ) ' (y - z{3 ) (7- )
FroSpecim (7-f1ic0)al,lwey, can expres {3 and as linear combinations of the normal variables
= [-[f-=�z�i�;��;�J = [�J + [�-=�i(j};:��ii,-] = +
]
[-�Becaus
Z is fiixaed,nceResmaturltic4.es3 werimpleiobtes thaeinjedoinitnnorResmualltit7.y2of. Agai{3 andn, usiTheing (7r-mean
tors ande covar
6), wevec­get
Cov ( [-�]) = Cov ( = u{-(���r�-�-i=-z-(-i-, z)-�i-z, J
Sidependent
nce Cov (. p(S, ee Res= uflotr4.t5h.e) normal random vectors p and these vect1ors are in­
Next
,
l
e
t
(
A
,
e)
Z

'
.
Then,
be
any
ei
g
enval
u
e-ei
g
envect
o
r
pai
r
f
o
r
I
Z
(
Z
'
Z)
1
1
2
by (7-6), [I - Z (Z'Z-1) Z' ] = [I - Z ( Z ' Z-)1- Z2' ] so
Ae = [I - Z( Z ' Z ) Z' ] e = [I - Z(Z ' Z ) -1Z' J e = A[I - Z( Z ' Z)-1Z' J e =

ThatResultis7., 2A),=andf0 orromResNow,ult t4.r 9[I,t-r [IZ-(ZZ(' ZZ) ' ZZ)'-] 1Z= ' ] =-A1 -+ A2(s+·ee· th+e Aproof
e
,
wher
n
1
A1 - A-2 val· · uesAofn arAieequaltheeione,genvalanduesof[
Z
'
.
J
Cons
e
quent
l
y
,
e
xact
l
y
I
Z(
Z
'
Z
)
t
h
e

r
e
s
t
ar
e
zer
o
.
I
t
t
h
en
f
o
l
o
ws
f
r
o
m
t
h
e
s
p
ec­
tral decomposition that

(7-13)
e

na2

Proof.

Y

e' e

f3 ,

1
V2iT
1

{3,

-c' c

1

c-

c-

j 2 a2

( y - z{3 ) ( y - z{3 )

I

{J

f3 .

1

12

n

e

e.

e

a

Ae

e.

A

e)

0


1.

>

n

>

r

1

>

e ) A'

e,

n

r

1

A2e

of


Section 7.4


I nferences About the Reg ress ion Model

367

,
wher
.
e
,
.
e
ar
e
t
h
e
nor
m
al
i
z
ed
ei
g
envect
o
r
s
as

s
o
ci
a
t
e
d
wi
t
h
t
h
e
ei
g
en­
e
,
e
r
n
1
1
2
values A1 = A2 = = An-r-1 = 1. Let
e2
V=
e
r
1


Then V is normal with mean vect{ e'or 21eandk - 2e'ek - 2 =
Cov( V; , Vk) = o:u u u ' otherwise
2)-1and by (7-10),
That is, thena2 ar=ee'inedependent
N(
O
,
oZ
V
'
Z
J
(
Z
=
'
Z
)
=
i V� V�-r-1
. d1.str1"bUted 2Xn2-r-1·
A
conf
i
d
ence
el
l
i

p
s
o
i
d
f
o
r
i
s
eas
i
l
y
cons
t
r
u
ct
e
d.
I
t
i
s
expr
e
s
e
d

i
n
t
e
r
m
s
of
t
h
e
fJ
1
2
2
estimatedcovariancematrix s (Z'Z) ,where s = e' ej( n - r - 1) .
2
1
and
i
s
Let
=
Z
wher
e
Z
has
f
u

l
r
a
nk
r
1) .
(
O
,
oN
fJ
n
Then a 100(1 - a)(%fJ -confp )i'dZence' Z ( fJre-P)gion wherand ne -Fr+r1,-n-r-d.1 (fa. ) is the upper (100a)th percentile of an F-distribution with r 1
Also, simultaneous 100( 1 - a) % confidence intervals for the f3i are given by
� i � V( r 1) Fr + l , n - r - l ( a ), = 0, . . , r
2(Z'Z)-1 correspondi1n2g to f3i ·
where Var( f3JConsis thiederdiatgonal
el
e
ment
of
s
V Cov((Z'ZV)1)/2=( fJ(Z-' ZfJ))1 and2hCov(e snotymmetpe)t(hZatri'cZE(s)q1Vuar2 )=e=-ro-o2ot(Zmat' Z)r1ix2(Z(Z' Z' Z)-) 1 ( Z. '[ZSee) 1;2 = a-21 Set
andTherVefoisrenor, Vm'Val=ly (diPst-ribfJut)e' (d,Z 'sZin)ce1 2(iZt cons' Z ) 1 i2s(tPs of- lfJin)ear= combi
n
at
i
o
ns

of
t
h
e
{3
/
s
.
'
)
(
Z
'
Z
)(
P
(
P
)
fJ
fJ
2
2
7.4 (nin-dependent
distriebquent
uted lasy,
r - 1) s ly= ofe' eV.is Cons
io-s2x�dis-trr-i1b,utinedependent
d as o- X;+ly1 • ofBy{3 Resand,ulthence,
[utXi;o+n,1/and(r th1)e conf]/ [x�id-encer-1/ (eln li-psroi-d f1)]or fJ=fo[lVo'wsV/(. Prr oje1)]ctin/gs2thhasis elanlipFsor+id1, nfo-rr-(1fJ,.d.. i-strfJib)­

us[0,in. g. ,0,1,0,
Result. .5A.,0]1yiewildtsh A-1 = Z'I (a) , andwher=e
r
1
+
n
�;2
,
l
1
Var ( f3i) is the diagonal element of s ( Z' Z)- correspondi-n-glto f3" i · '
···

-----

�------

e

0

;

Vi

IS

e' [ l


lT

Result 7.5.

Y

i

k

-

e

+

+ ··· +



+ e,

e

+

+

+


1

±

-----

=

Proof.

+

i

1,

/\

/\

(2-22).]

0,

/\

"

+


+

+

-----

/\

l /3 ; -

u'

+




368

Chapter 7

M u ltivariate Li near Reg ression Models

{3 ,

The
conf
i
d
ence

el
l
i
p
s
o
i
d
i
s
cent
e
r
e
d
at
t
h
e
maxi
m
um
l
i
k
el
i
h
ood
es

t
i
m
at
e
andZ' Z.itsIforanieeintgaenvaltion andue issnearize arlyezerdetoe,rtmheinconfed byidencethe eielglenval
u
es
and
ei
g
envect
o
r
s
of
i
p
s
o
i
d
wi
l
be
ver
y
l
o
ng

i
n
h
t
e
directPrionactofittiohenercors ofrestepnondiignorngeeitghenvect
oulr.taneous" confidence property of the in­
e
"s
i
m
tone-ervalat-esa-ttiimmate etsvalinuRese tn-url-t17.(a/2)5. Inandstead,useththeye irnetpleravceals( r 1)Fr+1,n-r- 1 (a) with the
� tn - r - l ( � ) v\iaf ( �; )
when searching for important predictor variables.
TheWiscasonsseisnm, neientghbordatahinood.TablFie 7.t t1hwere reegrgates hioernemodel
d from 20 homes in a Milwaukee,
wher(in theousandstotofal dweldollalirnsg),sandize (in hundrsel eindsg ofpriscqeuar(ine ftehetous), ands asofsedols eladrs)val, utoe
these data using the method of least squares. A computer calculation yields
Total(dwel100 fltin)g size Asse($s1000)ed value Sel(l$in1000)g price
15.15.2301
57.63.38
74.74.08
16.14.2353
72.70.09
65.57.40
14.17.5373
63.28
74.76.09
63.
14.14.4981

60.57.72
72.73.50
15.13.2859
56.55.46
74.55
73.
15.14.4148
62.63.46
71.71.50
14.18.6873
60.67.22
78.86.59
15.25.2706
57.89.16
68.102.00
19.15.0375
68.61
84.69.00
60.
18.16.3056
66.65.38
88.76.00
+

±

Example 7.4

( 7 -1 4)


(Fitti ng a regression model to real-estate data)

lj = f3 o + {3 1 Zj 1 + {3 2 Zj 2 + Bj

z1 =

z2 =

Y=

TABLE 7 . 1

REAL-ESTATE DATA

Z2

Z1

2

y


[ 5..21523544 .0512 ]
.
1
463
and
30.
9

67]
[
2..604534
p
Thus, the fit ed equation is 30.967 2.634z1 .045z2
wiatitohns of t3.he473.leasThet squarnumber
s
i
n
par
e
nt
h
es
e
s
ar
e
t
h
e
es
t
i
m
at
e
d
s
t

a
ndar
d
devi
­
.
8
34,
i
n
di
c
at
i
n
g
t
h
at
t
h
e
dat
a
e
s
coef
f
i
c

i
e
nt
s
.
Al
s
o
,
exhigresbiiotna analstrongysisreofgrtehsesioendatrelaatusioinsnghtiph.e SASSee Panel
7.
1
,
whi
c
h
cont
a
i
n
s
t
h
er
e
­
the residuals pass the diagnostic checks dessctraitbisedticialn sSectoftwioarne7package.
.6, the fit eIdf
Section 7.4


I nferences About the Reg ress ion Model

369

1
( Z ' Z ) - ==

-.0172 .0067

1

== ( Z ' Z ) - Z ' y ==

y

==

( 7.88 )

+

( . 78 5 )

s ==

(

+

( . 2 85 )


R 2 ==

)

e

PANEL 7.1

SAS ANALYSIS FOR EXAM PLE 7.4 U S I N G PROC R E G .

title 'Reg ression Ana lysis';
data estate;
i nfi le 'T7- 1 . dat';
in put z1 z2 y;
proc reg data estate;
model y = z1 z2;

PROGRAM COM MAN DS

=

OUTPUT

Mode l : M O D E L 1
Dependent Va riable:
Ana lys is of Va ria nce
DF
2
17

19

Sou rce
Model
E rror
C Tota l

Deep Mean

c.v.

S u m of
Sq u a res
1 032 .87506
204.99494
1 2 37.87000

76.5 5000
4.53630

Mean
S q u a re
5 1 6.43753
1 2 .05853

Adj R-sq

F va l u e
42.828


Prob > F
0.0001

0.81 49

Parameter Estimates
Va riable
I NTERCEP
z1
z2

DF
1

T fo r HO:
Parameter = 0
3 .929
3.353
0 . 1 58

Prob > ITI
0.00 1 1
0.0038
0.8760


370

Chapter 7


M u ltivariate Linear Reg ress ion Mode ls

equat
i
o
n
coul
d
be
us
e
d
t
o
pr
e
di
c
t
t
h
e
s
e
l
i
n
g
pr
i

c
e
of
anot
h
er
hous
e
i
n
t
h
e
nei
g
h
­
bortervhaloodforf{3ro2m[seites (s7iz-1e4)and] is asgivseens ebyd value. We note that a 95% confidence
t
(
.
0
25)
� . 045 2. 1 10( . 285)

1
2
7
or
(

- . 5 56, . 647)
Sifronmce tthhee rconfegreids enceion model
intervalandinclthuedesanal{32ysis 0,retpeathe varediawiblteh zt2hmie sginhtglbee prdreodippedctor
vardictiiaoblneofz1s.elGiinvgenpridwelce. ling size, asses ed value seems to add lit le to the pre­
Pardicttoofr varregriablesesioonn analthe ryessispionss concer
n
ed
wi
t
h
as
s
e
s
i
n
g
t
h
e
ef
f
e
ct
s
of
par
t
i
c

ul
a
r
pr
e
­
e
var
i
a
bl
e
.
One
nul
l
hypot
h
es
i
s
of
i
n
t
e
r
e
s
t

s
t
a
t
e
s
t
h
at
cerZq+t1a,iZnq+of2,the, z/sz, . doThenotstaitneflmentuencethtatheZrqe+s1p, onsZq+2e, . ,Thesz, doenotprediincfltuoencers wil betralnsabellateeds
into the statistical hypot/3q+1hesi/3s q+2 0 or f3 2
(
7
1
5)
(
)
whereSet/3(t2i)ng [ /3q+1 , /3q+2'
z [ n XZ(q 1+ 1 ) n XZ( r2 q ) ] ,
we can expres the general linear model as
Z/3 [Z1 Z2] [-�f3_(Q2))_J Z1 P (1) Z2 P (2)
Underis bastheednulonl hypot
h
es
i
s
Z
The
l
i

k
el
i
h
ood
r
a
t
i
o
t
e
s
t
of
f3
f3
1
1
2
(
(
)
)
t
h
e
Extra sum of squares SS(y -(ZZ11)P-(1 )SS' (y -(ZZ) 1 P (1 ) - (y - zp)'(y - zp) (7-16)
where P (1) (Z1Z1 )-1 Z1y.
2

Let
Z
have
f
u
l
r
a
nk
1
and
be
di
s
t
r
i
b
ut
e
d
as
1
.
The
N
(
O
,
o)

n
lextikelraihsoodum ofrastqiouarteesst iofn (7-16)f3and(2) s2 is (yequivZalpe)nt'(yto- aZP)test (of bas- 1)ed. Ionn parthe­
ticular, the likelihood ratio test rejects if
------s2------ Fr-q,n-r- 1 (a)
in­

±

±

=

=



Li keli hood Ratio Tests fo r the Regression Pa ra meters

Y.

Y

.

. . •

H0 :

=


= ··· =

··· ,

=

{3, =

= 0

H0 :

{3, J.

=

Y=

1

+e=

H0 :

H0

=

= 0, Y =


res

+e

+

+e=

+ e.

res

=

=

Result 7.6.

H0 :

= 0

r+

=
H0

e

/


-

(SS res ( Z 1 ) - SS res ( Z ) ) / ( r - q )

>

H0

n

-r


Section 7.4

I nferences About the Regress ion Model

371

wherand e Fr-q,n-r-d.1 f. is the upper )th percentile of an F-distribution with
Gi
v
en
t
h
e
dat
a
and

t
h
e
nor
m
al
as
s
u
mpt
i
o
n,
t
h
e
l
i
k
el
i
h
ood
as
s
o
ci
a
t
e

d
2
with the parameters and a- is
1n 2 ;;.n -n/2
n
n
2
(
(
wiUnderth thethmaxie resmtrumictiooccur
r
i
n
g
at
(Z'Z) -1 Z 'y and B-2
n of the null hypothesis, Z1 1 and
n-r

1

-

(a)

Proof.

r-q

(100a


f3

L ( p,

u2 )

=

1

2'7T) /

2 2
e - ( y-Z(3 ) ' ( y - Z(3 ) / a

(I

{3

2 7T) /

<

-

=

(y - Zp ) ' (y - Z P )In .


=

Y

e

f3 ( ) + e

=

where the maximum occurB- s at 1 (Z 1Z1)-1Z1y. Moreover,
- Z1
- Z 1 1 )/
i
Rejecting 2 2 for small values of the likelihood ratio
maxmax L(/3,(/3 1 a-2a-) ) - (_a-(;-"' 221 )- 2 - ( "' 2 (;-0'21 "'2 )-n/2 - ( 0'"'21 ;;.2 "'2 )-n/2
is equivalent (tB-oire-jeB-ct2i)ng( for large valu(zes1 of) (B-i - (B-z))2)/B-(2 or its scaled version,
-- F
-nB-2
s2
Theor Respruelcedit ngwiF-trhatio has an F-distribution with - and d.f. (See
Comment. The likelihood ratio test is implemented as fol ows. To test whether
alsplondicoefnfgictioentthsesine acoefsubsficeitentarse. zerTheo,ifmitprthoevement
model iwin tthheandresiwidualthoutsumthofe tseqruarmsecors (trhee­
extviartahesuF-mrofatisoq. uarTheess)aimes compar
e
d
t
o
t

h
e
r
e
s
i
d
ual
s
u
m
of
s
q
uar
e
s
f
o
r
t
h
e
f
u
l
model
pr
o
cedur

e
appl
i
e
s
even
i
n
anal
y
s
i
s
of
var
i
a
nce
s
i
t
u
at
i
o
ns
4
whereMorZ isenotgeneralof fully,riatnk.is possible to formulate null hypotheses concerning
ltirnixearC havecombifunlatrioansnk,oflet of the foandrm consC/3ider Let the
ma­

C/3
( Thi s null hypothesis reduces to the previous choice when C [o ; (r - q )X ( r - q ) ] · )
Under the ful model, C is distributed as Nr-q( C/3, a-2C (Z'Z)-1 C') . We reject
A

f3 ( ) =

=

H0 : f3 ( )

_
f3 -'( l)_
- ,a_2

L

( )

=

{3 (

)

n

0

'


n

_

P (1) ) ' (y

(y

/

_

_

___

(I

+ "' 2

(I

_

_

1 +

_


(I

{3 , a2

H0

n

I r - q)
I(n - r - 1 )

-------

7.11

r

m = 1.)

f3
A0

I r - q)

- ssres

( ss res

q


n-r-1

[19]



=

0,

H0 :

H0 :

=

=

r-q

( r - q) X (r + 1 )

A0 •

0

=

p


4In situations where Z is not of full rank, rank(Z) replaces
Result 7.6.

r

l

I

+ 1 and rank ( Z 1 ) replaces q + 1 in


372

Chapter 7

M u ltivariate Linear Reg ress ion Models

C/3 = 0

0

at
l
e
vel
a
i
f

does
not lie i n t h e 100( 1 - a) % confidence ellip soid for
C/3. Equivalently, we rej ect C/3 if
( C p ) ' ( C (Z' Z) -1 C ') -1 ( C p) ( r - q) Fr n r ( a )
q - - l (7 - 1 7 )
where
( y - Zp) ' (y - ZP)/(n - r - 1) and Fr - q , n - r - 1 ( a ) is t h e upp er
1d.
f
.
Thet
e
s
t
i
n
(
7
qandn
(is1t00a)he litkhelperihoodcentrilaetiofanFd
i
s
t
r
i
b
ut
i
o
nwi

t
h
r
r
-1of7)
o
t
e
s
t
,
and
t
h
e
numer
a
t
o
r
i
n
t
h
e
r
a
t
i
o

i
s
t
h
e
ext
r
a
r
e
s
i
d
ual
s
u
Fm
(
S
ee
squareThes incurnextredexampl
by fit iengilthuestmodel
,
s
u
bj
e
c
t
t

o
t
h
e
r
e
s
t
r
i
c
t
i
o
n
t
h
at
C
handled by the general theoryrajutesst howdescrunbal
ibed. anced experimental designs are easily
Malof aelaandrge freemalstaure apatnt rchaionsnr.atTheed thseersveircveircaetiinngsthrweree eesconver
tablishtmented insto(loancatiniodex.ns)
Tablis catee7.gor2 contizedaiaccor
ns thedidatng atofolroncatio18n (cus1, t2,omeror 3)s. Each
dat
a
poi
n
t

i
n
t
h
e
t
a
bl
e
0
and
fequalemalenumber1). sThiofsobscateergorvatiiozatnsiopern hascellt.hFore forinmsattance,ofandathtgender
weo-combiway(nmtaatblalioeenwiofthloun­ca­
thasion21 randespmalonseeshas. In5trroeducisponsngest,hwhireeledummy
the combivarniatablioens oftoloaccount
cation 2foandr lofcatemalione
andmodeltwolindummy
var
i
a
bl
e
s
t
o
account
f
o
r
gender

,
we
can
devel
o
p
a
r
e
gr
e
s
i
o
n
ki
n
g
t
h
e
s
e
r
v
i
c
e
i
n

dex
t
o
l
o
cat
i
o
n,
gender
,
and
t
h
ei
r
"i
n
t
e
r
a
ct
i
o
n"
using the design matrix
Locat1 ion Gender0 Serv15.ice2
11
21.27.23

00
21.22
11
00
21.
11
36.92.44
11
22
27.15.23
00
22
9.18.21
00
50.00
01
222
44.
1
63.
6
15.30.23
33
00
36.40.49
33
11
H0 :

H0 :


= 0

s2

---------

s2

>

'

=

f3 = 0 .

Example 7 . 5

[22]).

(Testi ng the i m porta nce of additional pred ictors
using the extra sum-of-sq uares approach)

=

=

=


Y

TABLE 7.2

RESTAU RANT-S ERVICE DATA

(Y)


Section 7.4

I nferences About the Reg ress ion Model

373

cons1tant 1locat0 ion0 gender
i
n
t
e
r
a
ct
i
o
n
1
0
1
0

0
0
0
0
11 11 00 00 11 00 11 00 00 00 00 00 responses
11 11 00 00 11 00 11 00 00 00 00 00
11 11 00 00 00 11 00 11 00 00 00 00 } 2 responses
11 00 11 00 11 00 00 00 11 00 00 00
Z = 11 00 11 00 11 00 00 00 11 00 00 00 responses
11 00 11 00 01 01 00 00 01 01 00 00
11 00 01 01 01 01 00 00 00 01 01 00 } 2 responses
11 00 00 11 01 01 00 00 00 00 01 01 } 2 responses
1 0 0 1 0 1 0 0 0 0 0 1 } 2 responses
The coefficif3'ent =vector ,8can1 , ,8be2 , s,(3e3t ,outT1 , asT2 , 'Y1 'Y12 'Y2 1 , 'Y2 , 'Y3 1 'Y32]
'
'
'
/
s
0)
r
e
pr
e
sent
t
h
e
ef
f

e
ct
s
of
t
h
e
l
o
cat
i
o
ns
on
t
h
e
det
e
r
m
i
n

wher
e
t
h
e
,8

ttihoen'Yofik'sserrevprice,estehnte T/sthe rleoprcateisoen-ntgtender
he effeincttserofactgender
onctst.he service index, and
i
o
n
ef
f
e
The
des
i
g
n
mat
r
i
x
Z
i
s
not
of
f
u
l
r
a
nk.
(

F
or
i
n
s
t
a
nce,
col
u
mn
1
equal
s
the suForm ofthcole compl
umnse2-4te model
or col,urmnsesults fromIna fcomput
act, rank(er prZ)ogr= am give
SS
e
(
Z
)
=
2977.
4
r
s
and The- ramodel
nk(Z)wi=th18out-the=in12.teraction terms has the design matrix zl con­

sisting of the first six columnsSSofreZ.s (ZWe1 ) =fin3419.d that1
wiy3t2h= 0-(nroank(locatZ1i)on-gender
= 18 - 4 inte14.ractToion)tes,twe comput
Y1 1 = eY1 2 = Y21 = Y2 = 'Y3 1 =
(SSres(Z I ) - SSres( Z ) )/ (6 - 4) (SSres(ZSSl r)es-(ZSS)/12res(Z) )/2
= (3419.2977.1 - 42977./12 4)/2 = '89






5

5

[ f3 o ,
(i >

5-6.)

6.

6

n

=

n


F-

s2

H0 :
-

-------


374

Chapter 7

M u ltivariate Linear Reg ress ion Models

TheF-disF-trirbauttioionmaywithbe compar
e
d
wi
t
h
an
appr
o
pr
i
a
t

e
per
c
ent
a
ge
poi
n
t
of
a
n
r
a
t
i
o
i
s
not
s
i
g
ni
f
i
c
ant
f
o

r
any
r
e
as
o
and
d.
f
.
Thi
s

Fablnotedepend
significanceuponlevelany loConscatioen-gender
quently, weinconclteractuideon,thandat thtehseserevitceerminsdexcandoesbe
droppedUsinfrgomthethexte model
.
r
a
s
u
mo
f
s
q
uar
e
s
appr

o
ach,
we
may
ver
i
f
y
t
h
at
t
h
er
e
i
s
no
s significant;
tdihfateirseI,nncemalanalbetesyandwsiseen-offe-lmalvoarcatieasinceodons snot(intouatlgioivcatoenstihowheren sefafmeeectthr)ae, butticelngslthcount
tato sgender
ersviarce.e iunequal
,
t
h
e
varinteiraatictoinoinsn tcannot
he respusonsuale laty tberibsuteparablaetteoddiinftoerienntdependent
predictoramount
variablse.s Toandevaltheiur­

atnecese thsearryeltaotifvitetihneflmodel
uenceswiofththande prwiedithcouttorsthone tethrme sreinspquesonsetiionntandhis cascompute, it ies
the appropriate F-test statistics.
2

12

a.



7.5

I N FERENCES FROM TH E ESTIMATED REGRESSION F U N CTI ON

Once
an
i
n
ves
t
i
g
at
o
r
i
s
s
a

t
i
s
f
i
e
d
wi
t
h
t
h
e
f
i
t
e
d
r
e
gr
e
s
i
o
n
model
,
i
t

can
be
us
e
d
t
o
sprolevdiecttworo varpreidiablcteiso.nThenproblze0msand. �etf3 canz0 be usez0d1 , . . . t,oZoesr]tibematseelethcteerdegrvaleus esionfofrutnche­
tatioz0n .{30 {31z0 1 f3rZor at z0 and to estimate the value of the response
Letz0 Y0 denotz01 , .e. . t,hZeorJ.valAccor
ue of dthinegrteospthonse model
e wheninthe pretdihectexpect
or vareiadblvalesuhavee of valisues
I
(
Y
z
E
o ) f3o {3 1 Zo 1
o
f3 rZo r
Its least squares estimate is z0f3 .
0
For
t
h
e
l
i
n

ear
r
e
gr
e
s
i
o
n
model
i
n
i
s
t
h
e
unbi
a
s
e
d
l
i
n
ear
z
f3
1
esertriomrsatorarofe Enor(Y0mI alz0l)ywiditshtrmiibutnimed,umthvaren iaance, Var(z0P) confz0(iZdence' Z) izn0te2r.valIf tfhoer

E(YQ I zo) z0f3 is provided by
zO tn-r-l (� ) v!(zQ(Z'Zf1z0)s2
where tn-rd.-1f. the upper )th percentile of a !-distribution with
For
a
f
i
x
ed
'o
0,
i
s
j
u
s
t
a
l
i
n
ear
combi
n
at
i
o
n
of
t

h
e
{3
/
s
,
/3
z
z
"'
"'
"
1
2
Resa-2(Z'ultZ)-1applby Resies. uAlltso, VarUnder
(z0f3) thezf0uCovrther( f3as)szu0mptz0io(nZ'tZ)hat z0ia-s norsinmcealCovly di(sf3tr)ib­
uted, Result asserts that p is Nr+1 ( f3, a-2(Z'Z)-1) independently of s2/a-2 , which
= [1,

+

+

···

(1)

(2)

+


Y

Esti mati ng the Reg ression Fu nction at z0

(7-3),

= [1,

=

+

+

···

= Zo /3

+

"

Result 7.7.

Yo

"

(7-3),


o

=

100( 1 - a)%

e

(7 - 1 8)

=

n -

r-1

( a/2)

IJ ±

IS

100( a/2

Proof.

7.3

=


7.2.

7.4

so

=

==

e


is distributed as � - and
Section 7 . 5

I nferences from the Est i mated Reg ress ion Fu nction

375

z'o P

Consequently, the linear combination is

x r 1/ ( n - r - 1 ) .
2
N ( z'o /3 , o- z0 (Z' Z ) - 1 z 0)
( z'o P - z'o{J )j \1o-2 z'o ( Z ' Z ) - 1 z0
v?J;?


"'

( z'o{J - z'o{J )

is distributed as tn-r-1 • The confidence interval fol ows.
Prthanediescttiiomnatofinagnewthe observation, suofch as YQ,Accorat ding to the regres ioisnmormodele uncerof tain
or (new response (expected value of at (new error)
2
wher
e
i
s
di
s
t
r
i
b
ut
e
d
as
and
i
s
i
n
dependent
of

and,
hence,
of
and
s
.
2
Thedoes ernotro.rs influence the estimators and s through the responses but
has the Given the linear regres ion model of a new observation
The variance of the Var
is
Whenis givthene erbyrors have a normal distribution, a for
n ( �)
where tn -rdegr-1 ees ofis ftrheedom.
e upper )th percentile of a t-distribution with
We
f
o
recas
t
by
whi
c
h
es
t
i
m
at
e

s
By
Res
u
l
t
has
and
The
f
o
r
e
cas
t
er
r
o
r
i
s
t
h
en
Thus
,
so the predictor is unbiased. Since and are independent,
Ifly idit isstrfiubrtutheerd, asandsumedso isthtathe lhasineara normal
di
s

t
r
i
b
ut
i
o
n,
t
h
en
i
s
normal
combi
n
at
i
o
n
Cons
e
quent
l
y
,
i
s
di
s

t
r
i
b
ut
e
d
as
Di
v
i
d
i
n
g
t
h
i
s
r
a
t
i
o
by
which is distributed as �
we obtain


Forecasti ng a New Observation at z0


expected value

z'o = [ 1, z0 1 , . . . , Zo r J

Y0 •
Y0 = z'ofJ + eo

Yo) =
N(O, o-2 )

eo

Y0 z0) +

/3

Result 7.8.

e

Yo

"'

"'

"'

l


Vs 2 ( 1 + z o (z'zr 1 z o )

eo

Yo

(7-3),

"'

z'ofJ = f3 o + {3 1 Zo 1 + · · · + f3 r Zo r
forecast error Yo - z0{J
"'
( Yo - z'o /3 ) = o-2 (1 + z0(Z ' Z ) - 1 z 0)
100 ( 1 a)% prediction interval
za P ± t - r-

"'

100 ( a/2

( a/2)

n-r-1

p
Y,

e


e

unbiased predictor

(7 -3),

Yo z'o/3 ,
E ( Yo I z 0) .
7.7, z'o/3
1
2
E ( z'ofJ ) = z'o{J
Var (z'ofJ ) = z'o ( Z ' Z) - z0o- •
E ( Yo - z o /3 ) = E ( eo ) +
Yo - z'o/3 = Zo /J + eo - z o /3 = eo + Zo ( fJ - p ) .
E ( z0 ( fJ - fJ"' ) ) = 0
eo
fJ
"'
1
2
2
Var ( Y0 - z0{J ) = Var ( eo ) + Var ( z0f3 ) = a- + z0 ( Z ' Z ) - z0o- = a-2 ( 1 + z'o (Z' Z ) -1 z 0) .
e
fJ
Y0 - z'o{J .
1
N (O , 1 ) .
( Y0 - z'o P )j \1o-2 (1 + z0 ( Z ' Z ) - z0)

VX 1 / ( n - r - 1 ) ,
v?J;?,
-Yo
( - z o/3 )
Vs2 ( 1 + z0(Z' Z ) - 1 z0)
Proof.
A

A

"'

"'

"'

r

"'

"'

which is distributed as tn-r-1 . The prediction interval fol ows immediately.




376

Chapter 7


M u ltivariate Linear Reg ress ion Models

The prediction interval for Yo is wider than the confidence interval for estimating
the value of the regression function E(Yo I z0) = z0 /J . The additional uncertainty in
forecasting Y0 , 1which is represented by the extra term s 2 in the expressi o n
s 2 ( 1 + z0(Z ' Z) - z 0) , comes from the presence of the unknown error term s0 •
Example 7 . 6

(I nterva l esti mates for a mean response
and a futu re response)

Companies considering the purchase of a computer must first assess their future
needs in order to determine the proper equipment. A computer scientist col­
lected data from seven similar company sites so that a forecast equation of
computer-hardware requirements for inventory management could be devel­
oped. The data are given in Table 7.3 for
z 1 = customer orders (in thousands)
z2 = add-delete item count (in thousands)
Y = CPU (central processing unit) time (in hours)
TABLE 7.3

COM PUTE R DATA

Z1
(Orders)

Z2
(Add-delete items)


y
(CPU time)

123.5
146.1
133.9
128.5
151.5
136.2
92.0

2.108
9.213
1.905
.815
1.061
8.603
1.125

141.5
168.9
154.8
146.5
172.8
160.1
108.5

Source: Data taken from H. P. Artis, Fo recasting Computer Require­
ments: A Forecaster's Dilemma (Piscataway, NJ: Bell Laboratories, 1 979).


Construct a 95% confidence interval for the mean CPU time, E ( Yo I z0)
{30 + {3 1 Zo1 + {3 2 z0 2 at z0 = [ 1, 130, 7.5]. Also, find a 95% prediction interval
for a new facility ' s CPU requirement corresponding to the same z0 •
A computer program provides the estimated regression function
=

y

=

( Z ' Z )-1

=

[

8.42 + 1.08z 1 + .42z2
8.17969
- .06411 .00052
.08831 -.00107 .01440

and s = 1.204. Consequently,
"

z0 /J

=

8.42 + 1.08 ( 130 ) + .42 ( 7.5 )


=

]

151.97


Section 7.6

Model Checking and Other Aspects of Reg ression

377

and s VzO(Z ' Zr 1 z0 = 1.204 ( .58928) = .71. We have t4( .025 ) = 2.776, so the
95% confidence interval for the mean CPU time at z0 is

z o P ± t4( .025 )s V'z O( Z ' Zr 1 z o = 151.97 ± 2.776 ( .71 )
or ( 150.00, 153.94 ).
.--___-___-1Since s V1 + z 0 ( Z ' Z ) _ z 0 = ( 1.204 ) ( 1.16071 ) = 1.40, a 95% prediction
interval for the CPU time at a new facility with conditions z0 is
z 'o P ± t4 ( .025)s V'1 + z0(Z' Z ) - 1 z0 = 151.97 ± 2.776 ( 1.40)
or ( 148.08, 155.86 ) .

7.6

M O D E L CH ECKI NG AND OTH ER ASPECTS OF REGRESSION
Does the Model Fit?

Assuming that the model is "correct," we have used the estimated regression function
to make inferences. Of course, it is imperative to examine the adequacy of the model

before the estimated function becomes a permanent part of the decision-making
apparatus.
All the sample information on lack of fit is contained in the residuals
B 1 = Y1 - f3 o - f3 1Z1 1 - . . · - f3rZ1r
B2 = Y2 - f3 o - /31 Z2 1 - · · · - f3 r Z2r
A

A

A

A

A

A

en = Yn - f3 o - {3 1Zn 1 - . . . - f3 rZnr
e = [I - Z ( Z ' Z ) - 1 Z ' ] y = [ I - H] y
A

or

A

A

(7-19)

If the model is valid, each residual ej is an estimate of the error sj , which is assumed to


be a normal random variable with mean zero and variance a-2 • Although
the residuals
1
2
e have expected value O, their covariance matrix a- [ I - Z (Z' Z ) - Z ' ] = a-2 [I - H]
is not diagonal. Residuals have unequal variances and nonzero correlations. Fortu­
nately, the correlations are often small and the variances are nearly equal.
Because the residuals e have covariance matrix a-2 [I - H], the variances of the
sj can vary greatly if the diagonal elements of H, the leverages h j j , are substantially
different. Consequently, many statisticians prefer graphical diagnostics based on stu­
dentized residuals. Using the residual mean square s 2 as an estimate of a-2 , we have
---(7-20)
Var ( ej ) = s 2 ( 1 - hjj ) ,
j = 1, 2, . . , n

.

and the studentized residuals are
*
s.
1
A


1
---:;:=::=:=========
A

=


v

"

I

s 2 (1 - hjj ) '

j = 1, 2, . . . , n

(7-21)

We expect the studentized residuals to look, approximately, like independent draw­
ings from an N ( 0, 1 ) distribution. Some software packages go one step further and
studentize ej using the delete-one estimated variance s 2 (j), which is the residual mean
square when the jth observation is dropped from the analysis.


378

Chapter 7

M u ltiva riate Linear Reg ress ion Models

Residuals should be plotted in various ways to detect possible anomalies.
general diagnostic purposes, the following are useful graphs:
1.

Plot the residuals ej against the predicted values Yj


=

For

ffi o + ffi l Zj l + . . . + ffirz;

r .

Departures from the assumptions of the model are typically indicated by two
types of phenomena:
(a) A dependence of the residuals on the predicted value. This is illustrated in
Figure 7.2(a). The numerical calculations are incorrect, or a {3 0 term has
been omitted from the model.
(b) The variance is not constant. The pattern of residuals may be funnel shaped,
as in Figure 7.2(b ), so that there is large variability for large y and small
variability for small y . If this is the case, the variance of the error is not
constant, and transformations or a weighted least squares approach (or
both) are required. (See Exercise 7.3.) In Figure 7.2( d), the residuals form
a horizontal band. This is ideal and indicates equal variances and no de­
pendence on y .
2. Plot the residuals ej against a predictor variable, such as z 1 , or p roducts of pre­
dictor variables, such as zi or z 1 z2 • A systematic pattern in these plots suggests
the need for more terms in the model. This situation is illustrated in
Figure 7.2(c).
3. Q-Q plots and histograms. Do the errors appear to be normally distributed? To
answer this question, the residuals ej or ej can be examined using the techniques
discussed in Section 4.6. The Q-Q plots, histograms, and dot diagrams help to
detect the presence of unusual observations or severe departures from nor­
mality that may require special attention in the analysis. If n is large, minor de­

partures from normality will not greatly affect inferences about /3.

�����----� y

A

(a)

������

A

y

(b)

A

������ y

(c)

(d)

Figure 7.2

Resid u a l p l ots.



×