Tải bản đầy đủ (.pdf) (54 trang)

Lecture Undergraduate econometrics - Chapter 4: Properties of the least squares estimators

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (187.08 KB, 54 trang )

Chapter 4
Properties of the Least Squares Estimators
Assumptions of the Simple Linear Regression Model
SR1.

yt = β 1 + β 2 xt + et

SR2.

E(et) = 0 ⇔ E[yt] = β1 + β2xt

SR3.

var(et) = σ2 = var(yt)

SR4.

cov(ei, ej) = cov(yi, yj) = 0

SR5.

xt is not random and takes at least two values

SR6.

et ~ N(0, σ2) ⇔ yt ~ N[(β1 + β2xt), σ2] (optional)

1
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4



4.1

The Least Squares Estimators as Random Variables

To repeat an important passage from Chapter 3, when the formulas for b1 and b2, given in
Equation (3.3.8), are taken to be rules that are used whatever the sample data turn out to
be, then b1 and b2 are random variables since their values depend on the random variable
y whose values are not known until the sample is collected. In this context we call b1 and
b2 the least squares estimators. When actual sample values, numbers, are substituted
into the formulas, we obtain numbers that are values of random variables. In this context
we call b1 and b2 the least squares estimates.

2
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4


4.2

The Sampling Properties of the Least Squares Estimators

The means (expected values) and variances of random variables provide information
about the location and spread of their probability distributions (see Chapter 2.3). As such,
the means and variances of b1 and b2 provide information about the range of values that
b1 and b2 are likely to take. Knowing this range is important, because our objective is to
obtain estimates that are close to the true parameter values. Since b1 and b2 are random
variables, they may have covariance, and this we will determine as well. These “predata” characteristics of b1 and b2 are called sampling properties, because the
randomness of the estimators is brought on by sampling from a population.


3
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4


4.2.1 The Expected Values of b1 and b2
• The least squares estimator b2 of the slope parameter β2, based on a sample of T
observations, is

b2 =

T ∑ xt yt − ∑ xt ∑ yt
T ∑ x − ( ∑ xt )
2
t

2

(3.3.8a)

• The least squares estimator b1 of the intercept parameter β1 is
b1 = y − b2 x

(3.3.8b)

where y = ∑ yt / T and x = ∑ xt / T are the sample means of the observations on y
and x, respectively.
4
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4



• We begin by rewriting the formula in Equation (3.3.8a) into the following one that is

more convenient for theoretical purposes:
b2 = β2 + ∑ wt et

(4.2.1)

where wt is a constant (non-random) given by

wt =

xt − x
∑ ( xt − x )2

(4.2.2)

Since wt is a constant, depending only on the values of xt, we can find the expected
value of b2 using the fact that the expected value of a sum is the sum of the expected
values (see Chapter 2.5.1):
5
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4


E (b2 ) = E ( β2 + ∑ wt et ) = E (β2 ) + ∑ E ( wt et )
(4.2.3)
= β2 + ∑ wt E (et ) = β2


[since E (et ) = 0]

When the expected value of any estimator of a parameter equals the true parameter
value, then that estimator is unbiased. Since E(b2) = β2, the least squares estimator b2
is an unbiased estimator of β2. If many samples of size T are collected, and the
formula (3.3.8a) for b2 is used to estimate β2, then the average value of the estimates b2
obtained from all those samples will be β2, if the statistical model assumptions are
correct.
• However, if the assumptions we have made are not correct, then the least squares
estimator may not be unbiased. In Equation (4.2.3) note in particular the role of the
assumptions SR1 and SR2. The assumption that E(et) = 0, for each and every t, makes
6
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4


∑ wt E(et ) = 0 and E(b2) = β2.

If E(et) ≠ 0, then E(b2) ≠ β2. Recall that et contains,

among other things, factors affecting yt that are omitted from the economic model. If
we have omitted anything that is important, then we would expect that E(et) ≠ 0 and
E(b2) ≠ β2. Thus, having an econometric model that is correctly specified, in the sense
that it includes all relevant explanatory variables, is a must in order for the least
squares estimators to be unbiased.
• The unbiasedness of the estimator b2 is an important sampling property.

When

sampling repeatedly from a population, the least squares estimator is “correct,” on

average, and this is one desirable property of an estimator. This statistical property by
itself does not mean that b2 is a good estimator of β2, but it is part of the story. The
unbiasedness property depends on having many samples of data from the same
population. The fact that b2 is unbiased does not imply anything about what might
happen in just one sample. An individual estimate (number) b2 may be near to, or far
from β2. Since β2 is never known, we will never know, given one sample, whether our
7
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4


estimate is “close” to β2 or not. The least squares estimator b1 of β1 is also an
unbiased estimator, and E(b1) = β1.

4.2.1a The Repeated Sampling Context
• To illustrate unbiased estimation in a slightly different way, we present in Table 4.1
least squares estimates of the food expenditure model from 10 random samples of size
T = 40 from the same population. Note the variability of the least squares parameter
estimates from sample to sample. This sampling variation is due to the simple fact
that we obtained 40 different households in each sample, and their weekly food
expenditure varies randomly.

8
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4


Table 4.1 Least Squares Estimates from
10 Random Samples of size T=40
n

b1
b2
1
51.1314
0.1442
2
61.2045
0.1286
3
40.7882
0.1417
4
80.1396
0.0886
5
31.0110
0.1669
6
54.3099
0.1086
7
69.6749
0.1003
8
71.1541
0.1009
9
18.8290
0.1758
10

36.1433
0.1626
• The property of unbiasedness is about the average values of b1 and b2 if many samples
of the same size are drawn from the same population. The average value of b1 in these
10 samples is b1 = 51.43859 . The average value of b2 is b2 = 0.13182 . If we took the
averages of estimates from many samples, these averages would approach the true
9
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4


parameter values β1 and β2. Unbiasedness does not say that an estimate from any one
sample is close to the true parameter value, and thus we can not say that an estimate is
unbiased. We can say that the least squares estimation procedure (or the least squares
estimator) is unbiased.

4.2.1b Derivation of Equation 4.2.1
• In this section we show that Equation (4.2.1) is correct.

The first step in the

conversion of the formula for b2 into Equation (4.2.1) is to use some tricks involving
summation signs. The first useful fact is that

 1

− 2x ∑ xt + T x 2 = ∑ xt2 − 2 x  T ∑ xt  + T x 2
 T

= ∑ xt2 − 2T x 2 + T x 2 = ∑ xt2 − T x 2


∑ (x − x ) = ∑ x
2

t

2
t

(4.2.4a)

10
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4


Then, starting from Equation(4.2.4a),

∑ (x − x ) = ∑ x
2

t

2
t

− T x 2 = ∑ xt2 − x ∑ xt = ∑ xt2

(∑ x )



To obtain this result we have used the fact that x = ∑ xt / T , so

2

t

T

∑x

t

(4.2.4b)

=T x.

• The second useful fact is

∑ ( xt − x )( yt − y ) = ∑ xt yt − Tx y = ∑ xt yt

x ∑y


t

T

t


(4.2.5)

This result is proven in a similar manner by using Equation (4.2.4b).

11
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4


• If the numerator and denominator of b2 in Equation (3.3.8a) are divided by T, then
using Equations (4.2.4) and (4.2.5) we can rewrite b2 in deviation from the mean form
as

b2 =

∑ ( x − x )( y − y )
∑ (x − x )
t

t

2

(4.2.6)

t

This formula for b2 is one that you should remember, as we will use it time and time
again in the next few chapters. Its primary advantage is its theoretical usefulness.
• The sum of any variable about its average is zero, that is,


∑(x − x ) = 0
t

(4.2.7)

• Then, the formula for b2 becomes
12
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4


b2 =

∑ ( x − x )( y − y ) = ∑ ( x − x ) y − y ∑ ( x − x )
∑ (x − x )
∑ (x − x )
t

t

t

t

t

2

2


t

t

(4.2.8)
 (x − x )
(x − x ) y

=
= ∑
x
x
(
)

 ∑ ( x − x )

t

t

t

2

t

t



y = ∑ wt yt
2 t


where wt is the constant given in Equation (4.2.2).
• To obtain Equation (4.2.1), replace yt by yt = β1 + β2xt + et and simplify:

b2 = ∑ wt yt = ∑ wt (β1 + β2 xt + et ) = β1 ∑ wt + β2 ∑ wt xt + ∑ wt et

(4.2.9a)

13
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4


First,

∑w = 0,
t

this eliminates the term β1 ∑ wt . Secondly,

∑w x

t t

= 1 (by using


Equation (4.2.4b)), so β2 ∑ wt xt = β2 , and (4.2.9a) simplifies to Equation (4.2.1),
which is what we wanted to show.

b2 = β2 + ∑ wt et
The term

∑ w = 0 , because
t

 ( xt − x ) 
1
w
=
=
( xt − x ) =0


∑ t ∑ ( x − x )2
2 ∑
(
x
x
)

 ∑ t
 ∑ t
To show that

∑(x − x )
t


(4.2.9b)

2

∑w x

t t

= 1 we again use

( using ∑ ( x − x ) = 0 )

∑(x − x ) = 0 .
t

t

Another expression for

is
14
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4


∑ ( x − x ) = ∑ ( x − x )( x − x ) = ∑ ( x − x )x − x ∑ ( x − x ) = ∑ ( x − x )x
2

t


t

t

t

t

t

t

t

Consequently,

∑ wt xt =

∑(x − x )x = ∑(x − x )x
∑ (x − x ) ∑(x − x )x
t

t
2

t

t


t

t

t

=1

4.2.2 The Variances and Covariance of b1 and b2
• The variance of the random variable b2 is the average of the squared distances between
the values of the random variable and its mean, which we now know is E(b2) = β2.
The variance (Chapter 2.3.4) of b2 is defined as
15
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4


var(b2) = E[b2 – E(b2)]2
It measures the spread of the probability distribution of b2.
• In Figure 4.1 are graphs of two possible probability distribution of b2, f1(b2) and f2(b2),
that have the same mean value but different variances.

The probability density

function f2(b2) has a smaller variance than the probability density function f1(b2).
Given a choice, we are interested in estimator precision and would prefer that b2 have
the probability distribution f2(b2) rather than f1(b2). With the distribution f2(b2) the
probability is more concentrated around the true parameter value β2, giving, relative to
f1(b2), a higher probability of getting an estimate that is close to β2. Remember,
getting an estimate close to β2 is our objective.

• The variance of an estimator measures the precision of the estimator in the sense that
it tells us how much the estimates produced by that estimator can vary from sample to
sample as illustrated in Table 4.1. Consequently, we often refer to the sampling
16
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4


variance or sampling precision of an estimator.

The lower the variance of an

estimator, the greater the sampling precision of that estimator. One estimator is more
precise than another estimator if its sampling variance is less than that of the other
estimator.
• If the regression model assumptions SR1-SR5 are correct (SR6 is not required), then
the variances and covariance of b1 and b2 are:

17
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4




xt2

var(b1 ) = σ 
2
 T ∑ ( xt − x ) 

2

σ2
var(b2 ) =
∑ ( xt − x )2

(4.2.10)



−x
cov(b1 , b2 ) = σ 2 
2

(
)
x
x
 ∑ t

• Let us consider the factors that affect the variances and covariance in Equation
(4.2.10).
1. The variance of the random error term, σ2, appears in each of the expressions. It
reflects the dispersion of the values y about their mean E(y). The greater the
variance σ2, the greater is the dispersion, and the greater the uncertainty about
18
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4



where the values of y fall relative to their mean E(y). The information we have
about β1 and β2 is less precise the larger is σ2. The larger the variance term σ2, the
greater the uncertainty there is in the statistical model, and the larger the variances
and covariance of the least squares estimators.
2. The sum of squares of the values of x about their sample mean,

∑(x − x )
t

2

,

appears in each of the variances and in the covariance. This expression measures
how spread out about their mean are the sample values of the independent or
explanatory variable x. The more they are spread out, the larger the sum of squares.
The less they are spread out the smaller the sum of squares. The larger the sum of
squares,

∑(x − x )
t

2

, the smaller the variance of least squares estimators and the

more precisely we can estimate the unknown parameters. The intuition behind this
is demonstrated in Figure 4.2. On the right, in panel (b), is a data scatter in which
the values of x are widely spread out along the x-axis. In panel (a) the data are
19

Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4


“bunched.” The data in panel (b) do a better job of determining where the least
squares line must fall, because they are more spread out along the x-axis.
3. The larger the sample size T the smaller the variances and covariance of the least
squares estimators; it is better to have more sample data than less. The sample size
T appears in each of the variances and covariance because each of the sums consists
of T terms.

∑ ( xt − x)2

Also, T appears explicitly in var(b1).

The sum of squares term

gets larger and larger as T increases because each of the terms in the

sum is positive or zero (being zero if x happens to equal its sample mean value for
an observation). Consequently, as T gets larger, both var(b2) and cov(b1, b2) get
smaller, since the sum of squares appears in their denominator. The sums in the
numerator and denominator of var(b1) both get larger as T gets larger and offset one
another, leaving the T in the denominator as the dominant term, ensuring that
var(b1) also gets smaller as T gets larger.
20
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4



4. The term Σx2 appears in var(b1). The larger this term is, the larger the variance of
the least squares estimator b1. Why is this so? Recall that the intercept parameter
β1 is the expected value of y, given that x = 0. The farther our data from x = 0 the
more difficult it is to interpret β1, and the more difficult it is to accurately estimate
β1. The term Σx2 measures the distance of the data from the origin, x = 0. If the
values of x are near zero, then Σx2 will be small and this will reduce var(b1). But if
the values of x are large in magnitude, either positive or negative, the term Σx2 will
be large and var(b1) will be larger.
5. The sample mean of the x-values appears in cov(b1, b2). The covariance increases
the larger in magnitude is the sample mean x , and the covariance has the sign that
is opposite that of x . The reasoning here can be seen from Figure 4.2. In panel (b)
the least squares fitted line must pass through the point of the means. Given a fitted
line through the data, imagine the effect of increasing the estimated slope, b2. Since
the line must pass through the point of the means, the effect must be to lower the
21
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4


point where the line hits the vertical axis, implying a reduced intercept estimate b1.
Thus, when the sample mean is positive, as shown in Figure 4.2, there is a negative
covariance between the least squares estimators of the slope and intercept.
• Deriving the variance of b2:
The starting point is Equation (4.2.1).
var(b2 ) = var ( β2 + ∑ wt et ) = var ( ∑ wt et )
= ∑ wt2 var(et )
=σ 2 ∑ wt2

[since β2 is a constant]


[using cov(ei , e j ) = 0]

[using var(et ) = σ 2 ]

σ2
=
∑ ( xt − x )2

(4.2.11)

22
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4


The very last step uses the fact that


2
(
x

x
)
1
t


w
=

=
∑ ∑
2
2
 ∑ ( xt − x ) 2
(
x

x
)
{
}
 ∑ t

2
t

(4.2.12)

• Deriving the variance of b1:
From Equation (3.3.8b)
b1 = y − b2 x = 1 ∑ (β1 + β2 xt + et ) − b2 x
T
= β1 + β2 x + e − b2 x = β1 − (b2 − β2 ) x + e
⇒ b1 − β1 = −(b2 − β2 ) x + e

23
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4



Since E(b2) = β2 and E (e) = 0 , it follows that
E (b1) = β1 − (β2 − β2 ) x + 0 = β1

Then

var(b1) = E[(b1 − β1)2 ] = E[(−(b2 − β2 ) x + e)2 ]
= x2 E[(b2 − β2 )2 ] + E[e 2 ] − 2 xE[(b2 − β2 )e]
=x2 var(b2 ) + E[e 2 ] − 2 xE[(b2 − β2 )e]
Now

24
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4


var(b2 ) =

σ2
∑ ( xt − x)2

and
E (e 2 ) = E[( 1 ∑ et )2 ] = 12 E[(∑ et )2 ]
T
T
= 12 E[∑ et2 + cross-product terms in eie j ]
T
= 12 {E[∑ et2 ] + E[cross-product terms in eie j ]}
T
= 12 (Tσ2 ) = 1 σ2

T
T
(Note: var(et ) = E[(et − E (et ))2 ] = E[et2 ])
25
Slide 4.
Undergraduate Econometrics, 2nd Edition –Chapter 4


×