Advanced Econometrics
Chapter 2: Finite Sample Properties Of The OLS Estimator
Chapter 2
FINITE SAMPLE PROPERTIES OF
THE OLS ESTIMATOR
Y = X. + ε
•
with
ε ~ N [0, σ 2 I ]
rank(X) = k non-stochastic.
ε random → Y random.
•
βˆ = ( X ′X ) −1 X ′Y ; βˆ is a statistics on a sample, βˆ is random because Y is random. Being
random:
- βˆ has a probability distribution, called the sampling distribution.
- Repeatedly draw all possible random sample of size n calculate " βˆ " each time.
Let explore some statistical properties of the OLS estimator βˆ & build up its sampling
distribution.
I.
UNBIASED:
βˆ
= ( X ′X ) −1 X ′Y
= ( X ′X ) −1 X ′( Xβ + ε )
′X ) −1 X ′X β + ( X ′X ) −1 X ′ε
X
= (
I
= β + ( X ′X ) −1 X ′ε
E( βˆ ) = E[ β + ( X ′X ) −1 X ′ε ]
Nam T. Hoang
University of New England - Australia
1
University of Economics - HCMC - Vietnam
Advanced Econometrics
Chapter 2: Finite Sample Properties Of The OLS Estimator
= β + E[( X ′X ) −1 X ′ε ]
E (ε ) =
= β + ( X ′X ) −1 X ′
0
E ( βˆ ) = β
⇒
βˆ is an estimator of , it is a function of the random sample (the element of Y).
Note: we talk about the sample → that means we talk about Y only. Because X is a constant
- fix matrix. "Repeatedly draw all possible random samples of size n → draw Y".
The least squares estimator is unbiased for (E(ε) = 0, X is non-stochastic).
→
ˆ ˆ E ( βˆ ))' ]
(β
VarCov( βˆ ) = E[( βˆ −
E
))( β −
β
VarCov( βˆ )
βˆ − β = ( X ′X ) −1 X ′ε
β
= E [( βˆ − β )( βˆ − β )' ]
= E[( X ′X ) −1 X ′ε )(( X ′X ) −1 X ′ε )' ]
= E [( X ′X ) −1 X ′εε ' X ( X ′X ) −1 ]
= ( X ′X ) −1 X ′E (εε ' ) X ( X ′X ) −1
= ( X ′X ) −1 X ′σ ε2 X ( X ′X ) −1
= σ ε2 ( X ′X ) −1 X ′X ( X ′X ) −1
I
= σ ε2 ( X ′X ) −1
So: VarCov( βˆ ) = σ ε2 ( X ′X ) −1
For the model:
~
~
~
Yi = βˆ2 X i 2 + βˆ3 X i 3 + ei
βˆ2
βˆ3
βˆ =
σ ε ( X ′X )
2
−1
~
∑ X i23
= σ ε − X~ X~
∑ i 2 i 3
Nam T. Hoang
University of New England - Australia
2
∑ X ~X
∑ X ∑ X~
~ ~
i2
2
1
i3
2
i2
2
i2
~
X i23 −
(∑ X~
i2
~
X i3
)
2
University of Economics - HCMC - Vietnam
Advanced Econometrics
Chapter 2: Finite Sample Properties Of The OLS Estimator
βˆ
= VarCov 2
βˆ
3
σ ε2 ∑ X i23
~
→ Var ( βˆ )
=
∑X
~2 ~2
X i3 −
i2
(∑ X~
i2
~
X i3
)
2
σ ε2 / ∑ X i22
~
∑( X
~ ~ 2
X i3 )
i2
=
n2
~2
~
∑ X i 2 ∑ X i23
1−
nn
2
r23
sample correlation between X i 2 ; X i 3
→ Var ( βˆ )
=
∑X
σ ε2
~2
i2
(1 − r232 )
determined by:
i.
σ ε2 ↑ → Var ( βˆ ) ↑
ii.
r232 ↑ → Var ( βˆ ) ↑
iii.
Variation in Xi2
iv.
n sample size ↑ → Var ( βˆ ) ↓
∑X
~2
i2
↑ → Var ( βˆ ) ↓
VarCov ( βˆ ) = σ ε2 ( X ′X ) −1 → we don't know σ ε2 → need an estimator for σ ε2 .
Define: σˆ ε2 =
e' e
n−k
n: observations.
k: number of estimators.
e' e = ∑ ei2 = sum of squares.
•
Show σˆ ε2 is an unbiased estimator.
e = Mε → e'e = ε'M'Mε=ε'Mε
•
Note: trace of a square matrix.
Nam T. Hoang
University of New England - Australia
3
University of Economics - HCMC - Vietnam
Advanced Econometrics
Chapter 2: Finite Sample Properties Of The OLS Estimator
n
A is the sum of its principal diagonal elements (=
n ×n
∑a
i =1
ii
).
Rules: A, B nxn matrix
tr(A+B) = tr(A) + tr(B)
tr(A.B) = tr(B.A)
tr(λA) = λtr(A)
Trace is a linear operation → sum of certain elements.
E ( e' e )
= E (ε ' Mε )
= E[tr (ε ' Mε )] = E[tr (εε ' M )]
= trE (ε ' Mε ) = tr[σ ε2 . I .M )]
= σ ε2 tr ( M ) = σ ε2 [tr ( I n ) − tr ( X ( X ' X ) −1 X ' )]
= σ ε2 [n − tr ( X ( X ' X ) −1 X ')] = σ ε2 ( n − k )
I k ×k
And:
E ( e' e) σ ε2 ( n − k )
= σ ε2
=
n−k
n−k
So:
E (σˆ ε2 ) = σ ε2 → σˆ ε2 is an unbiased estimator of σ ε2 .
II. LINEARITY:
Any estimator that is a linear function of the random sample data is called a linear estimator.
Yi: random sample data.
ˆ
β
X ′X ) −1 X ′Y =
A . Y
= (
k × n n ×1
k ×1
A
where A is non-random:
Nam T. Hoang
University of New England - Australia
4
University of Economics - HCMC - Vietnam
Advanced Econometrics
Chapter 2: Finite Sample Properties Of The OLS Estimator
βˆ1
a11
a
βˆ2
21
=
βˆk
a k 1
a12
a 22
X k2
a1n Y1
a 2 n Y2
a kn Yn
→ βˆ1 = a11Y1 + a12Y2 + ... + a1nYk 1
→ βˆ , OLS estimator is linear and unbiased for .
Because βˆ is a linear function of Y and Y is a linear function of ε, → if ε is normal then
βˆ is normal. So the sampling distribution of the OLS estimator of is:
βˆ ~ N[ , σ ε2 ( X ′X ) −1 ]
III. EFFICIENCY:
Suppose we have 2 unbiased estimators, θˆ1 ; θˆ2 for θ . Then we say θˆ1 is more efficient
than θˆ2 if Var (θˆ1 ) ≤ Var (θˆ2 ) .
If θˆ1 ; θˆ2 are vectors unbiased estimators of θ , then θˆ1 is more efficient than θˆ2 if
k ×1
k ×1
k ×1
∆ = [V (θˆ1 ) − V (θˆ2 )] is positive semi-definite.
IV. GAUSS - MARKOV THEOREM:
"Under the assumptions of the classical regression model, the least squares estimators
of , βˆ = ( X ′X ) −1 X ′Y are the best linear unbiased estimators". (BLUE).
Linear: in Y
Best: Best for any alternative linear on unbiased estimators.
Var ( βˆ j ) ≤ Var (b j ) ∀j .
Proof: Let b is any other linear estimator of :
Nam T. Hoang
University of New England - Australia
5
University of Economics - HCMC - Vietnam
Advanced Econometrics
Chapter 2: Finite Sample Properties Of The OLS Estimator
b =
A . Y
k ×1
Unbiased:
k × n n ×1
E(b) =
E(b) = E(AY) =E(AX + Aε)
E(b) = AX + 0 = AX =
→ AX =I
Let A = (X'X)-1X' + C where C is any non-stochastic (k×n) matrix.
I = AX = [( X ' X ) −1 X '+C ] X = ( X ' X ) −1 X ' X + CX = CX = 0
I
b = AY = [( X ' X ) −1 X '+C ][ Xβ + ε ]
= (
X
'
X
) −1
X
'
X β + ( X ' X ) −1 X ' ε + CXβ + Cε
I
= β + ( X ' X ) −1 X ' ε + Cε
VarCov(b) = E[(b − β )(b − β )' ]
= E{[( X ' X ) −1 X ' ε + Cε ][( X ' X ) −1 X ' ε + Cε ]' }
= E[( X ' X ) −1 X ' (εε ' ) X ( X ' X ) −1 + ( X ' X ) −1 (εε ' )C '+Cεε ' X ( X ' X ) −1 + Cεε ' C ' ]
= σ ε2 (
X
'
X
) −1
X
'
X ( X ' X ) −1 + σ ε2 ( X ' X ) −1 X ' C '+σ ε2 CX ( X ' X ) −1 + σ ε2 CC '
I
= σ ε2 ( X ' X ) −1 + σ ε2 CC '
VarCov ( βˆ )
The jth diagonal element:
n
Var (b j ) = Var ( βˆ j ) + σ ε2 ∑ c 2ji ≥ Var ( βˆ j )
∀j = 1, k
i =1
→ Var (b j ) ≥ Var ( βˆ j )
∀j = 1, k
→ βˆ j is the best linear unbiased estimator (BLUE).
→ βˆ j is efficient estimator (smallest variance).
Nam T. Hoang
University of New England - Australia
6
University of Economics - HCMC - Vietnam
Advanced Econometrics
Chapter 2: Finite Sample Properties Of The OLS Estimator
V. REVIEW: STATISTICAL INFERENCE:
1. Linear function of normal random variables are also normal:
u
N( µ
)
, Σ
~
n ×1 n × n
n ×1
Z
P u is normally distributed.
=
→
m × n n ×1
m ×1
E ( Z ) = E ( Pu ) = PE (u ) = Pµ
VarCov( Z ) = E [( Z − E ( Z ))( Z − E ( Z ))' ]
= E[( Pu − Pµ )( Pu − Pµ )' ]
µ
= P
E[(
u
−
)(
u −µ
)' ]P' = PΣP'
Σ
Then Z
N ( Pµ , PΣP' )
~
2. Chi-squared distribution:
If Z
r×1
or Z ' Z
~
N (0, I ) then Z'Z has the Chi-squared distribution with r degree of freedom
χ [2r ] Z'Z
~
r: number of these independent standard normal variables in the sum of squares:
Theorem:
If Z
r×1
~
N (0, I ) and A is idempotent with rank equal to r, then:
n ×n
~
χ [2r ]
i.
Z ' AZ
ii.
r = tr ( A) = rank ( A)
3. Eigenvalue - eigenvector problem:
For a square matrix A , we can find n pairs of (λ j , c j ) such that:
n ×n
A c j = (λ j c j )
n ×n
n ×1
1×1 n ×1
j = 1,2, ... , n
1×1 n ×1
Nam T. Hoang
University of New England - Australia
7
University of Economics - HCMC - Vietnam
Advanced Econometrics
Chapter 2: Finite Sample Properties Of The OLS Estimator
n
( ∑ c 2j = 1)
normalizing: c j ' c j = 1
j =1
The eigenvectors are orthogonal to each other:
ci ' c j = 0
(∀i ≠ j )
so c = [c1, c2, ..., cn] is an orthogonal matrix:
( c ' = c −1 )
c' c = I
Eigenvalue - eigenvector problem:
A c j = (λ j c j )
n ×n
n ×1
cj'cj = 1
Let:
→
j = 1,2, ... , n
1×1 n ×1
ci ' c j = 0
C = [c1
c1 j
c2 j
cj =
cnj
(∀i ≠ j )
c2 cn ] ⇒ c' c = I
n ×n
n ×n
c' = c-1: orthogonal matrix:
AC = A[c1
AC = [c1
Ac2 Acn ] = [c1λ1
c2 cn ] = [ Ac1
c2
c 2 λ2 c n λn ]
λ1 0 0
0 λ 0
2
= CΛ
cn ]
0 λn
0
Λ
where Λ is a diagonal matrix: C ' AC = C ' CΛ = Λ
and also Rank ( A) = Rank ( Λ ) = number of no-zero of λj's.
Note: C' AC = Λ → C ' −1 C ' ACC −1 = (C ' ) −1 ΛC −1 = CΛC '
Remember: A = CΛC ' and C' AC = Λ ; C'C = I, C' = C-1
Theorem:
Let A be an idempotent matrix with rank = r and let Z
r×1
Z ' AZ
Nam T. Hoang
University of New England - Australia
~
~
N (0, I ) then:
χ [2r ] and rank ( A) = tr ( A)
8
University of Economics - HCMC - Vietnam
Advanced Econometrics
Proof: C' AC = Λ ,
Chapter 2: Finite Sample Properties Of The OLS Estimator
Z
~
r×1
N (0, I )
For A idempotent, λj = 0 or 1
Because: AC j = C j λ j → AAC j = AC j λ j = C j λ2j
So: C j λ2j = C j λ j
→ C j (λ2j − λ j ) = 0
→ C j λ j (λ j − 1) = 0 → λ j = 0 or λ j = 1
1
0
Write: C' AC = Λ =
0
0
0 0
0 0
1 0
0 0
0
1
0
0
There must be r nonzero elements of Λ , because rank ( A) = r = rank ( Λ ) = tr ( Λ ) since all
diagonal elements are 0 or 1.
(Rule: tr(A.B) = tr(B.A))
Also tr ( Λ ) = tr ( ACC ' ) = tr ( A)
so rank ( A) = tr ( A) = r
u = C
)
' , Z
n ×1
Z
n×1
n × n n ×1
~
N (0, I )
' )C = C ' C = I
E (uu ' ) = E (C ' ZZ ' C ) = C '
E (
ZZ
I
Contruct quadratic form:
n
u' Λu = Z ' C (C ' AC )C ' Z = Z ' AZ = ∑ ui2
~
χ [2r ]
i =1
So if Z
~
N (0, I ) and A is idempotent with rank equal to r, then
n ×n
Z ' AZ
Extension: So if Z
~
N (0, σ 2 I ) , then
~
Z ' AZ
σ
2
χ [2r ]
~
χ [2r ]
4. Other distribution:
Let Z be N(0,I) and let W be χ [r2 ] and let Z and W be independently distributed, then:
Nam T. Hoang
University of New England - Australia
9
University of Economics - HCMC - Vietnam
Advanced Econometrics
Chapter 2: Finite Sample Properties Of The OLS Estimator
Z
W
~ t[ r ]
r
has the t-distribution with r degree of freedom.
Let W be χ [r2 ] and let v be χ [s2 ] and W and v be independently distributed, then:
W
v
r
~
Fsr
s
has the F-distribution with r (numerator) and s (denominator) degree of freedom.
VI. TESTING HYPOTHESIS ON INDIVIDUAL COEFFICIENT:
Y = X. + ε
•
ε ~ N [0, σ 2 I ]
with
Recall: βˆ ~ N[ , σ ε2 ( X ′X ) −1 ]
So βˆ j ~ N[ j, σ ε2 [( X ′X ) −1 ]ij ]
→
βˆ j − β j
σ 2 ( X ' X ) −jj1
~ N [0,1]
but σ2, so this can't be used directly for constructing test or confidence intervals.
e' e = ε ' M ' Mε = ε ' Mε , M is idempotent with with rank(M) = its trace = n-k.
ε ~ N [0, σ 2 I ] → ε / σ ~ N [0, I ]
( n ×1)
⇒
e' e
σ
2
=
ε ' Mε
σ2
~
χ [2n − k ]
βˆ j − β j
So follow theorem:
σ 2 ( X ' X ) −jj1
~ tn −k
e' e
σ2
⇔
βˆ j − β j
e' e
( X ' X ) −jj1
n
k
−
(n − k )
~ tn −k
σˆ 2
Nam T. Hoang
University of New England - Australia
10
University of Economics - HCMC - Vietnam
Advanced Econometrics
Chapter 2: Finite Sample Properties Of The OLS Estimator
⇔
βˆ j − β j
σˆ 2 ( X ' X ) −jj1
~ tn −k
σˆ 2 ( X ' X ) −jj1 = σˆ β2ˆ = standard error of βˆ j .
j
Finally:
βˆ j − β j
σˆ β2ˆ
~ tn −k
j
This basic result enables us to test hypothesis about elements of
and to construct
confidence intervals for them (note that we need the assumption of normality of ε's).
EX: yˆ i = 1.4 + 0.2 xi 2 + 0.6 xi 3
( 0.7 )
0.05
H0:
2
=0
H1:
2
>0
t=
βˆ j − β j
SE ( βˆi )
(1.4 )
=
0.2 − 0
=4
0.05
tα (5%) = 1.74
d.o.f = n-k =17.
tα (1%) = 2.567
t > tα → reject H0.
EX: H0:
1
= 1.5
H1:
2
≠ 1.5 ( or ≥ 1.5 or ≤ 1.5)
t=
βˆ j − β j
SE ( βˆi )
=
1.4 − 1.5
= −0.1429 d.o.f = n-k =17.
0.7
2.5%
Nam T. Hoang
University of New England - Australia
2.5%
11
University of Economics - HCMC - Vietnam
Advanced Econometrics
Chapter 2: Finite Sample Properties Of The OLS Estimator
t < tα / 2 ⇒ cannot reject H0 at 5%.
VII. CONFIDENCE INTERVALS:
βˆi − β i
SE ( βˆi )
Recall:
ti =
so
Pr[ −tα / 2 ≤ ti ≤ −tα / 2 ] = 1 − α
Pr[ −tα / 2 ≤
~ tn −k
βˆi − β i
≤ − tα / 2 ] = 1 − α
SE ( βˆi )
Pr[ βˆi − tα / 2 SE ( βˆi ) ≤ β i ≤ βˆi + tα / 2 SE ( βˆi )] = 1 − α
• If we were to take a sample of size "n", construct this repeat many times then
100(1-α)% of such intervals would cover the true value of
i
• If we construct the interval once, there is no guarantee that the internal will cover the
true i].
• Type of errors: size & power of tests.
Type I: Reject H0 when it is true.
Type II: Accept H0 when it is false.
Assume:
Prob(type I error) = α
Prob(type II error) =
If sample size is fixed: α↓ ⇒ ↑
call α: significant level or size of the test.
→ Fix α and try to design the test so to minimize .
• Definition: The power of a test is 1- .
Power = 1 - Pr(accept H0/H0 false)
= Pr(reject H0/H0 false)
Nam T. Hoang
University of New England - Australia
12
University of Economics - HCMC - Vietnam
Advanced Econometrics
Chapter 2: Finite Sample Properties Of The OLS Estimator
• A test is "uniformly most powerful" if its power exceeds that of any other test (for the
same choice of α) over all possible alternative hypothesis.
• A test is "consistent" if its power → 1 as n →∞ for any false hypothesis.
• A test is unbiased of its power never falls below α.
VIII. FAMILY OF F-TEST:
For general linear restrictions, unrestricted model (U-model), original model.
H0: some restrictions on β . These define the restricted model (R-model):
k ×1
r
Fdfu
=
( ESS R − ESSU ) / r
ESSU ) / dfu
ESSR = error sum of squares from R-model: e′R e R
ESSU = error sum of squares from U-model: eU′ eU
r: number of restrictions in H0.
dfu: degree of freedom in U-model = n-k.
ESSU
σ
2
=
=
ESS R
σ 2
ESSU
σ 2
eU′ eU
σ
2
=
ε ′Mε
σ2
ε′ ε
M
σ σ
~
~
~
χ [2n − k ]
χ [2n − ( k − r )]
→
χ [2n − k ]
ESS R
σ
2
−
ESSU
σ2
~
χ [2r ]
( ESS R − ESSU ) / σ 2 r ( ESS R − ESSU ) / r
=
ESSU ) /(n − k )σ 2
ESSU ) /(n − k )
→
( ESS R − ESSU ) / r
ESSU ) /(n − k )
Nam T. Hoang
University of New England - Australia
~
Fnr− k
13
University of Economics - HCMC - Vietnam
Advanced Econometrics
Chapter 2: Finite Sample Properties Of The OLS Estimator
Case 1: Join significant of all slopes:
β
β = 1
k ×1
β 2 k −1
1
H0:
β
= 0 → r = k −1
2
( k −1) ×1
U-model:
Y = X β +ε
→
ESSU =e'e
R-model:
Yi = β1 + ε i
→
βˆ1 + Y
→
Yi = Y + ei
k ×1
dfu = n-k
n
ESS R = ∑ (Yi − Y ) 2
i =1
n
→
Fnk−−k1 =
( ∑ (Yi − Y ) 2 − e' e) /(k − 1)
i =1
e' e /(n − k )
=
R 2 /(k − 1)
(1 − R 2 ) /(n − k )
Case 2:
k −r
β
β = 1
k ×1
β 2 r
H0: β 2 = 0
U-model:
Y = Xβ + ε
→
ESSU = eU′ eU
R-model:
Y = X β +ε
→
ESSU = e′R e R
r ×1
r ×1
( k − r ) ×1
n
ESS R = ∑ (Yi − Y ) 2
i =1
→
EX:
Fnr− k =
( ESS R − ESSU ) / r
ESSU ) /(n − k )
Translog of production function:
log Y = β1 + β 2 log K + β 3 log L + β 4 (log K ) 2 / 2 + β 5 (log L) 2 / 2 + β 6 (log K log L) + ε
H 0 : β 4 = β 5 = β 6 = 0 Cobb-Douglas restrictions.
n = 27
ESSU = 0.67993
r=3
ESSR = 0.85163
n - k = 21
Nam T. Hoang
University of New England - Australia
14
University of Economics - HCMC - Vietnam
Advanced Econometrics
Chapter 2: Finite Sample Properties Of The OLS Estimator
→ Fnr− k = 1.768 . Critical value: F213 ,5% = 3.1
→ Fnr− k < Critical value
⇒ So do not reject H0 and conclude that are consistent with the Cobb-Douglas model.
Case 3: General restrictions.
β1
β = β 2
β 2
R β =C
r × k k ×1
r ×1
Restrictions:
β2 + β3 = 1
r ×1
r ×1
r ×1
→ [0 1 1]β = 1 ( r = 1)
R
If restrictions:
β 2 + β 3 = 1
( r = 2)
β1 = 0
0 1 1
1
→
β =
1 0 0
0
Jarque - Beta statistics:
H0: εi are normally distributed.
H1: εi are not normally distributed.
JB
~
χ 22
JB = SK2 +(Kur)2
Reject H0 for large JB.
Reject H0 if JB >7 (critical) or if p-value < 0.05
Nam T. Hoang
University of New England - Australia
15
University of Economics - HCMC - Vietnam