Tải bản đầy đủ (.pdf) (21 trang)

Chapter 12: ESTIMATION I - PROPERTIES OF ESTIMATORS

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (656.4 KB, 21 trang )

CHAPTER

12

Estimation I — properties of estimators

Estimation in what follows refers to point estimation unless indicated
otherwise. Let (S, ¥ P(-)) be the probability space of reference with X a r.v.

defined on this space. The following statistical model is postulated:
(i)

O={ f(x; 6), PEO},

(ii)

X=(X,, X,..., X,) 1s a random sample from f(x; 6).

Estimation

in the context

OCR;

of this statistical

model

takes

the form



of

constructing a mapping h(-):2% > ©, where 7% is the observation space and

h(-) is a Borel function. The composite function (a statistic) §=h(X):S
0
is called an estimator and its vaiue h(x), x EX an estimate. It is important to
distinguish between the two because the former is a random variable (r.v.)
and the latter is a real number.
Example 1
Let f(x; A= [1/./(2n)] exp{ —3(x — Ø)?}, 0c R, and X be a random sample
from ƒ(x; Ø). Then #=lR" and the following functions define estimators
of 6:

-~

lẻ
ni

0=-),X;
1

(i)

(ii)

(iii)

1


k

1

2=7 L Xs

k=1,2,...,n—-1:


232

Properties of estimators

(iv)

U,==

(v)

0;=—

1

ˆ

-

(vi)


ˆ

H

(Xx,

1

+ Xu):

n

Hị=I
1

n

6,=-

2

X75
;

iX;;
Wis.

ˆ

il


(vn)

1

n

0, 7 =——_
n+1,“

i

i

It is obvious that we can construct infinitely many such estimators.
However, constructing ‘good’ estimators is not so obvious. From the above
examples it is clear that we need some criteria to choose between these
estimators. In other words, we need to formalise what we mean by a ‘good’
estimator. Moreover, it will be of considerable help if we could devise
general methods of constructing such good estimators; a question
considered in the next chapter.

12.1

Finite sample properties

In order to be able to set up criteria for choosing between estimators we
need to understand the role of an estimator first. An estimator is
constructed with the sole aim of providing us with the ‘most representative
value’ of @ in the parameter space ©, based on the available information

in
the form of the statistical model. Given that the estimator 0=h(X) is a rv.
(being a Borel function ofa random vector X) any formalisation of what we
mean by a ‘most representative value’ must be in terms of the distribution of
6, say f(0). This is because any statement about ‘how near 6 is to the true &”
can only be a probabilistic one.
The obvious property to require a ‘good’ estimator 6 of 0 to satisfy is that
f(6) is centred around 06.
Definition 1
An estimator 0 of 6 is said to be an unbiased estimator of 0 if

E(Ô= |

6f (8) dé.

That is, the distribution of
parameter to be estimated.

(12.1)
@ has mean

equal

to the unknown


12.1

Finite sample properties


233

Note that an alternative, but equivalent, way to define E(Ô is

E(6)= | TH Ihs) fe: 0) dx,

(12.2)

—œ

where ƒ(x;Ø)=ƒf(xị, x;,....

Xạ; Ð) is the distribution of the sample, X.

Sometimes we can derive E(6) without having to derive either of the
above distributions by just using the properties of E(-) (see Chapter 4).
For example, in the case of the estimators suggested in Section 12.1, using
independence and the properties of the normal distribution we can deduce

that 6,~N(0,(1/n), this is because 0, is a linear function of normally

distributed r.v.’s (see Chapter 6.3), and
ˆ

]

n

E(ô,) “ẤN x x)=


1

H

{

H

0

3 EXJ=, no¥ 0=" A =0
no

(12.3)

(see Fig. 12.1). The second equality due to independence and the property
E(c)=c if ¢ is a constant and the third equality because of the identically
distributed assumption. Similarly for the variance of 0,
ˆ

1

H

4

i=l

n


i=1

_!

A

=

"
Using similar arguments we can deduce that

ˆ

1

đ,~N( 0)

ˆ

0:~N(6,1),

k=tl.,2,...,n—],

ˆ

26 2
2}
non

ˆ


Ø,~ vẫn

đ, ~ ny*(n; n0).

f(ơ,)

0,
Fig. 12.1. The sampling distribution of 0,.

6, = A(x)


234

Properties of estimators

ˆ

n+1\„

(n+1)2n+1)

0~A( MT)

ˆ

n

n


0.~N( Thuy}

Hence, the estimators 6,, 6, 03 are indeed unbiased but 6,, 05,6, and 6, are

biased. We define bias to be B(#)= E(#)—6 and thus B(O,)=[(2—n)/n]@,
B(Ô;)= n?(1 + 0)
— 0, B(8,)= [(n— 1)/218, B(Ô;)= — 0/(n+ 1). As can be seen

from the above discussion, it is often possible to derive the mean of an
estimator 6 without having to derive its distribution. It must be
remembered, however, that unbiasedness is a property based on the
distribution of ổ. This distribution is often called sampling distribution of 0
in order to distinguish it from any other distribution of functions of r.v.’s.

Although unbiasedness seems at first sight to be a highly desirable

property it turns out to be a rather severe restriction in some cases and in

most situations there are too many unbiased estimators for this property to
be used as the sole criterion for judging estimators. The question which
naturally

arises

is, ‘how

can

we


choose

among

unbiased

estimators?’.

Returning to the above example, we can see that the unbiased estimators
6,,65,6, have the same mean but they do not have the same variances.
Given that the variance is a measure of dispersion, intuition suggests that
the estimator with the smallest variance is in a sense better because its
distribution is more ‘concentrated’ around Ø. This argument leads to the
second property, that of relative efficiency.
Definition 2
An unbiased estimator 0, of 6 is said to be relatively more efficient

than some other unbiased estimator @, if

-

+

Var(Ô,)
-+a.

or eff(6,


Var(Ô

na 1.

2

In the above definition @, is relatively more efficient than either 6, or 8;
since

~

1
n

Var(0)=<¿

and

1

ˆ

= Var(0;),

-.
|
ˆ
Var(02)=7< 1=Var(@3),

k=1,2,...,n—-1,


for k>l.

Le. Ổ; is relatively more efficient than 6, (see Fig. 12.2).
In the case of biased estimators relative efficiency can be defined in terms

of the mean square error (MSE) which takes the form

E(6— 6)? = Var(6)+ [BI],

(12.6)


12.1

Finite sample properties

235

6

Fig. 12.2. The sampling distribution of 6,. 6, and 63.
that is, an estimator 6* is relatively more efficient than Ổ if

E(ơ* — 0)?< Etơ
~ 0}?
or

MSE(Ơ*) < MSE(Ơ.


Ascan be seen, this definition includes the definition in the case of unbiased
estimators as a special case. Moreover, the definition in terms of the MSE

enables us to compare an unbiased with a biased estimator in terms of
efficiency. For example,

MSE(6,) and intuition suggests that 0, is a ‘better’ estimator than 0, despite the fact
that 63 is unbiased and 0, is not; Fig. 12.3 illustrates the case. In
circumstances like this it seems a bit unreasonable to insist on
unbiasedness. Caution should be exercised, however, when different
distributions are involved as in the case of 6s.
Let us consider the concept of MSE in some detail. As defined above, the

MSE of 6 depends not only on @ but the value of 6 in © chosen as well. That
is, for some 6),€0

MSE(O, 65)= E(8— 65)?= E[(0
— E(8) + E(ô — 9;)]?
= Var(Ô)+[ B(ổ. 0a)12,
the cross-product term being zero. B(O,0))=E()—@,

(12.7)
is the bias of 6

relative to the value 09. Using the MSE as a criterion for optimal estimators
we would like to have an estimator Ø with the smallest MSE for all @in ©


236


Properties of estimators

f(ô+)

f£(\ô;)
8

Fig. 12.3. Comparing

the sampling distribution of 6, and 63.

(uniformly in ©). That is,

MSE(0. 0)
for all 0e©,

where 6 denotes any other estimator of 0. For any two estimators 0 and of
0 if MSE(6, 8) << MSE(9, 0), GEO with strict inequality holding for some
0€@, 0is said to be inadmissible. For example, 6, above is inadmissible
because

MSE(0,,0)
forall 0€@ ifn>1.

In view of this we can see that 6, and 6, are inadmissible because in MSE
terms are dominated by ổ„. The question which naturally arises is: ‘Can we
find an estimator which dominates every other in MSE terms?’ A moment’s

reflection suggests that this is impossible because the MSE criterion
depends on the value of @ chosen. In order to see this let us choose a
particular value of # in ©, say 0», and define the estimator
*=6,

forall xe 7:

Then MSE(@*, 0))=0 and any uniformly best estimator would have to

satisfy MSE(6*, 0)=0 for all 0 € ©, since 6, was arbitrarily chosen. That is,
estimate @ perfectly whatever its ‘true’ value! Who needs criteria in such a
case? Hence, the fact that there are no uniformly best estimators in MSE
terms is due to the nature of the problem itself.

Using

the concept

of relative efficiency

we can

compare

different


12.1

Finite sample properties


237

estimators we happen to consider. This, however, is not very satisfactory
since there might be much better estimators in terms of MSE for which we
know nothing about. In order to be able to avoid choosing the better of two
inefficient estimators we need some absolute measure of efficiency. Such a
measure is provided by the Cramer—Rao lower bound which takes the form

__

đĐ(0

|

ôn |
CR()=—=E———.=—

(12.8)

li

Clog f(x;0)\?

ƒ(x; Ø) is the distribution of the sample and B(6) the bias. It can be shown
that for any estimator 0* of 6

MSE(0*, 0)> CR(0)
under the following regularity conditions on ©:


(CRI)

The set A={x: f(x; 6)>0} does not depend on 0).

(CR2)

For each 0€®

(CR3)

0< E[(ơ/ơ0lof ƒ(x; 0)]ˆ<>

the derivatives [C log f(x; 0)] (60), i= 1, 2, 3, exist

Jor all xe:

/ør all 0e@.

In the case of unbiased estimators the inequality takes the form

vard*,>

| (TRL?)
^

"+

2

]-I


.

the inverse of this lower bound is called Fisher's information and is denoted
by 1,,(8).
Definition 3
An unbiased estimator 6 of 6 is said to be (fully) efficient if
-

12-1

|

ce

140)”°.

lth



var)=| rÍ” log J0: ”)

That is, an unbiased estimator is efficient when its variance equals the
Cramer—Rao lower bound.
In the example considered above the distribution of the sample is
H

H


#(x;Ø)=[[ #tx¿ 9= [] K
m1

i=l

=(2)

"2

xP)

1

\V(Q2n)

y
i=l

exp, —l0u 0#!)

ison


238

Properties of estimators

log ƒ(x; 0)= —5 logl2z)—„

~

and

TH

1

he

dlogƒf(x:0_

z| (ee

"vo

”)

2

||

n

3. (x,—Ø)?

=1

<

b0


ws)

An alternative way to derive the Cramer—Rao
equality

dlog fA)"

A ()

2

|

lower bound is to use the

]_ ___[ d* log f(x: 8)

- _| âu

by independence.

}

(12.9)

which holds true under CR1—CR3 and /(x; 6) is the ‘true’ density function.

In the above example [d? log f(x; 0)]/(d@?)= —n and hence the equality
holds.


So, for this example, CR(6) = 1/n and, as can be seen from above, the only

estimator which achieves the bound is 6,, that is, Var(0,) = CR(); hence 6,

is a fully efficient estimator. The properties of unbiasedness, relative
efficiency and full efficiency enabled us to reduce the number of the
originally suggested estimators considerably. Moreover, by narrowing the
class of estimators considered, to the class of unbiased estimators, we
succeeded in ‘solving’ the problem of ‘no uniformly best estimator’,
discussed above in relation to the MSE criterion. This, however, is not very
surprising given that by assuming unbiasedness we exclude the bias term
which is largely responsible for the problem.

Sometimes the class of estimators considered is narrowed even further by
requiring unbiased as well as linear estimators. That is, estimators which are
linear functions of the r.v.’s of the sample. For example, in the case of
example 1 above, 6,, 0, 63, 9,, 6, and 6, are linear estimators. Within the

class of linear and unbiased estimators we can show that 6, has minimum
variance. In order to show that let us take a general linear estimator

J=c+ ¥ aX,

(12.10)

i=1

which includes the above linear estimators as special cases, and determine
the values for c, a;,i=1,2,...,n, which ensure that @ is best linear unbiased


estimator (BLUE) of u. Firstly, for @ to be unbiased we must have E()=0
which

implies

that

c=0

and

)?_,a;=1.

Secondly,

since

Var(@)=

y?-, a7o?=07 Y"_, a? we must choose the a,s so as to minimise )'"_ , a? as

well as satisfy }?_, a;= 1 (for unbiasedness). Setting up the Lagrangian for


12.1

Finite sample properties

239


this problem we have

min: l(a,A)= ¥ =
đị

i=l

at),

(12.11)

i=1

ol
3a, 72449

A
Le. a= 5

i=1,2,...,n,

Summing over i,
H

i=1

n

a=)


A

.

.

x=1sAq=-,
ini 2
n

1

Lê d=-,
n

i=1,2,...,n,

for c=0 and a;=1/n, i=1,2,...,n, =(1/n) 3 7—, X,, which is identical to
6,. Hence 0, is BLUE (minimum variance among the class of linear and

unbiased estimators of y). This result will be of considerable interest in

Chapter 21, in relation to the so-called Gauss-Markov theorem.

The properties of unbiasedness and efficiency can be generalised directly
to the multiparameter case where 0=(0,,... .6,,). Ois said to be an unbiased
estimator of 6 if

E()=0,


ie. E(6)=0,,

i=1,2,....m.

In the case of full efficiency we can show that the Cramer—Rao inequality for
an unbiased estimator @ of @ takes the form
Cov(

ˆ

ôlog ƒ{x; Ø\/ôlog ƒ(x;Ø)V

Geer

mae Gaara le

(12.12)

or

Var(6,) >1,(0)7',

i=1,2,...,m

(m being the number of parameters), where I,(0);,' represents the ith
diagonal element of the inverse of the matrix

oe (ee) |
élog f(x; 0)\/é log f(x; 0)\


_

=|

67 log f(x; 0)

|

(12.13)

called the sample information matrix; the second equality holding under the
restrictions CR1—CR3. In order to illustrate these, consider the following
example:


240

Properties of estimators

Example 2

!

oP

= Fam
Vv

f(x;


“3

I {x-—p\?
.-

3

DO=<

(H)

X=(X,,X;,..., X„} 1s a random sample from ƒ(x; 8).

L

In example
`

L

:

+

„Ø=(u,ø“)clRxIR”?;

(1)

| discussed above we deduced that
=-


]

H

` X,
Hit

(12.14)

is a ‘good’ estimator of y, and intuition suggests that since j/ is in effect the
sample moment corresponding to « the sample variance
]

H

Pa Y (X10?

(12.15)

i=1

should be a ‘good’ estimator of o?. In order to check our intuition let us
examine whether G7” satisfies any of the above discussed properties.

t(Š (X;—ñ) }* (0x. —H) -tđ~/)P)

=e

3 0x ¡~M?+(Đ—H ,°~3X,—/lli

(12.16)

Since

E(X,—n)°=ø3,

2

Eyi— "=~

and E[(X,—g)(ji

ơ2

=~

from independence, we can deduce that
n

n

r| Š

=a |= y (02

i=1

i=1

2


-2

2

=n

1)o?.

(12.17)

This, however, implies that E(¢?) = [(n— 1)/n]ø? # ø2, that is, ¢? is a biased
estimator of ø?, Moreover,

it is clear that the estimator

H

s”=——_n—1

j=

(X,—/?

(12.18)

is unbiased. From Chapter 6.3 we also know that
2

(15 ~2-1)


(12.19)


12.1

Finite sample properties

241

and thus
Var(s=

ot

ap

2(n —

1) =

20+

(12.20)

n-l

since the variance of a chi-square r.v. equals twice its degrees of freedom. Let

us consider the question whether i= X,, and s? are efficient estimators:

6)= I] ƒ(x;
i=1

8



1

củ

ø./(2n)
(a?) 7/2

= Gye

1/x;—ŠŸ
&xP

EXP)

2\
1

ø

n

55


2 (x;—)?},

(12.21)

=>
1

log ƒ(x; Ø)= =slog 2n—5 log ø? —3

H

Ø

Clog f(x; 8)
Clog f(x; 8) _
60

lv

ou
Clog f(x; 0)

_

(12.22)

(

)


a

7

n

1

"



of

at | \ W398 M298 2,

G7 log f(x; 0)

ê?log ƒ(x; 6)

Go? ou

éa*

@ log f(x; 8)9) _
Cp?
6000 — | ê?logƒ(x;Ø0)
n
32


t

H

32

0

0

—.

1,(0)=

n
2a*

(12.23)

êm êø?
ê?log ƒ(x;6)

1
—zã4>.(X¡;—H)

Tel")

=

(X;-1)?,


¡=q

ToL

224 T2 2 (Xị— MỂ
t

a?

an

and

. (1224)

[I(Ø] !=

0

203
0

aor

n

(12.25)



242

Properties of estimators

This clearly shows that although X,, achieves the Cramer—Rao lower bound

s? does not. It turns out, however, that no other unbiased estimator exists

which is relatively more efficient than s?; although there are more efficient
biased estimators such as
22
1
vườn

¢
^* (X;—-

7 2
X,,)’.

(12.26)

Efficiency can be seen as a property indicating that the estimator ‘utilises’
all the information contained in the statistical model. An important concept
related to the information ofa statistical model is the concept of a sufficient
statistic. This concept was introduced by Fisher (1922) as a way to reduce
the sampling information by discarding only the information of no
relevance to any inference about 6. In other words, a statistic 1(X) is said to
be sufficient for @ if it makes no difference whether we use X or 1(X) in
inference concerning 8. Obviously in such a case we would prefer to work

with r(X) instead of X, the former being of lower dimensionality.
Definition 4
A statistic (+): 2 R", n>m, is called sufficient for 0 if the
conditional distribution f(x/t(x) =) is independent of 9, i.e. Ð does
not appear in f(x/t(x)=t) and the domain of f{(-) does not involve 6.
In example 1 above intuition suggests that t(X)=)"., X; must be a
sufficient statistic for @ since in constructing a ‘very good’ estimator of 0, 6,.
we only needed to know the sum of the sample and not the sample itself.
That is, as far as inference about @ is concerned knowing all the numbers
(X,,X3,...,X,,) orjust }"_, X;makes no difference. Verifying this directly
by deriving f(x/t(x)=1) and showing that it is independent of 6 can be a
very difficult exercise. One indirect way of verifying sufficiency is provided

by the following lemma.

Fisher-Neyman factorisation lemma
The statistic t(X) is sufficient for 0 if and only if there exists a
factorisation of the form

I(x; 0) =f (t(x); 8) - A(x),

(12.27)

where (t(x); 0) is the density function of t(X) and depenas on 6 and
h(X), some function of X independent of 6.
Even this result, however, is of no great help because we have to have the
statistic t(X) as well as its distribution to begin with. The following method

suggested by Lehmann and Scheffe (1950) provides us with a very convenient


way to derive minimal sufficient statistics. A sufficient statistic t(X) is said to


12.1

Finite sample properties

243

be minimal if the sample X cannot be reduced beyond 1(X) without losing
sufficiency. They suggested choosing an arbitrary value x, in % and form
the ratio

xEx

MSDiy x5),

F(X; 9)

(12.28)

066,

and the values of x, which make g(x, X,; 6) independent of @ are the required
minimal sufficient statistics.

In example 2 above
.

G(X, Xo;


{1

mm...
20°}

H

m4

X?—-}

H

i=l

xả |

KP)

i=1

x,-¥ Xo}
i=1

(12.29)

This clearly shows that 1(X)=()7., X;. )7., X?7) is a minimal sufficient

statistic since for these values of xg g(x, Xp; Ø)= 1. Hence, we can conclude

that (X,,s7) being simple functions of 2(X) are sufficient statistics. It is
important to note that we cannot take yr. X; or yey X? separately as

minimal sufficient statistics; they are jointly sufficient for 0=(u, 07).

In contrast to unbiasedness and efficiency, sufficiency is a property of
statistics in general, not just estimators, and it is inextricably bound up with
the nature of ®. For some parametric family of density functions such as the
exponential family of distributions sufficient statistics exist, for other
families they might not. Intuition suggests that, since efficiency is related to
full utilisation of the information in the statistical model, and sufficiency can
be seen as a maximal reduction of such information without losing any
relevant information as far as inference about @ is concerned, there must be
a direct relationship between the two properties. A relationship along the

lines that when an efficient estimator is needed we should look no further
than the sufficient statistics, is provided by the following lemma.
Rao and Blackwell lemma

Let 1(X) be a sufficient statistic for 8 and t(X) be an estimator of 8,
then
E(h(X)
— 0)? < E(t(X)—6)*,

060,

(12.30)

where h(X) = E(t(X)/t(X) = 1), i.e. the conditional expectation of t(X)
given t(X)=t.


From the above discussion of the properties of unbiasedness, relative and
full efficiency and sufficiency we can see that these properties are directly


244

Properties of estimators

related to the distribution of the estimator 0 of 0. As argued repeatedly.
deriving the distribution of Borel functions of r.v.’s such as O=h(X) isa very
difficult exercise and very few results are available in the literature. These
results are mainly related to simple functions of normally distributed r.v.’s
(see Section 6.3). For the cases where no such results are available (which
is the rule rather than the exception) we have to resort to asymptotic results.
This implies that we need to extend the above list of criteria for ‘good’
estimators to include asymptotic properties of estimators. These asymptotic
properties will refer to the behaviour of 6 as n > ~. In order to emphasise
the distinction between these asymptotic properties and the properties
considered so far we call the latter finite sample (or small sample) properties.
The finite sample properties are related directly to the distribution of 0, say
f(6,). On the other hand, the asymptotic properties are related to the
asymptotic distribution of 6,.
12.2

Asymptotic properties

A natural property to require estimators to have is that asn > ~ (i.e. as the
sample size increases) the probability of @ being close to the true value @
should increase as well. We formalise this idea using the concept of

convergence In probability associated with the weak law of large numbers
(WLLN) (see Section 9.2).
Definition 5
An estimator 0,=h(X) is said to be consistent for 6 if

lim Pr(|Ô,— 0|
(12.31)

Hy

ÖP

and we write 0, 0.
This is in effect an extension of the WLLN for the sample mean X, to some
Borel function h(X). It is important to note that consistency does not refer to
0, approaching Ø in the sense of mathematical convergence. The
convergence refers to the probability associated with the event lô, —0|derived from the distribution of 0, as n> x. Moreover, consistency is a
very minimal property (although a very important one) since if 6, is a

consistent estimator of 0 then so is 6*=6,+7405926/n if Pr(|0,—6|>
7405 926/n)= 1/n, n> 1, which implies that for a small n the difference |6, —- 6|
might be enormous, but the probability of this occurring decreasing to zero

aS H —>%.
Fig. 12.4 illustrates the concept in the case where 0, has a well-behaved


12.2


Asymptotic properties

245

{
|
'


1

0—ce

0

\

0+€c

Fig. 12.4. Consistency in the case of a symmetric uniformly converging
distribution.

symmetric distribution for n,ysuggest that if the sampling distribution f(6,) becomes less and less
dispersed as n> x and eventually collapses at the point 0 (i.e. becomes
degenerate), then Ø, is a consistent estimator of 0. The following lemma
formalises this argument.
Lemma


12.1

If 6, is an estimator of @ which satisfies the following properties

(i)

lim E(6,)=0;

(ii)

lim Var(@,)=0,

p

then 6, > 0.

nox

It is important, however, to note that these are only sufficient conditions for
consistency (not necessary); that is, consistency is not equivalent to the
above conditions, since for consistency Var(6,) need not even exist. The
above lemma, however. enables us to prove consistency in many cases of
interest in practice. If we return to example | above we can see that
.

?

0,20

~


!

since Pr|Ô, —0|<e)>I——¿,
ne?

>0,


246

Properties of estimators

by Chebyshev’s inequality and lim,,_, ,,[ 1 —(1/ne?)] = 1. Alternatively, using
Lemma 12.1 we can see that both conditions are satisfied. Similarly, we can
P
P
P
show that ổ; — Ø, ổ + Ø ('+>` reads 'đoes not converge in probability to'),
P

P

P

P

6, + 0,65 + 8,06 + 0,8, + 0. Moreover, for ở? and s? ofexample 2 we can
P
P

show that đ?— øˆ and s?— øŸ,

A stronger form of consistency associated with almost sure convergence
is a very desirable asymptotic property for estimators.
Definition 6
An estimator 0, is said to be a strongly consistent estimator of 0 if

Pr( tim ñ,~0)= 1
and is denoted by ô,— 8.
The strong consistency of 6, in example | is verified directly by the SLLN

and that of s? from the fact that it is a continuous function of the sample
moments X, and m,=(1/n) )"_, X? (see Chapter 10). Consistency and

strong consistency can be seen as extensions of the weak law and strong law
of large numbers for }''_, X; to the general statistic 6,, respectively.
Extending the central limit theorem to 6, leads to the property of
asymptotic normality.
Definition 7
An estimator 0, is said to be asymptotically normal if two sequences

{V,(0), n= 1}, {8,, n> 1} exist such that

(0)
This

way

of


defining

?(ỗ,—Ø,) Sz~ N(O, 1).
asymptotic

normality

(12.32)
presents

several

logical

problems deriving from the non-uniqueness of the sequences {V,(0)} and
{6,},n= 1. A more useful definition can be used in the case where the order
of magnitude (see Section 10.4) of V,(@) is known. In most cases of interest in

practice such as the case ofa random sample, V,(6) is of order 1/n, denoted

by V,(8) = O(1/n). In such a case asymptotic normality can be written in the
following form:

s/n(6,—9,) ~ N(OV()).
/

(12.33)

where ‘~’ reads ‘asymptotically distributed as’ and V(@)>0 represents the



12.3

Predictors and their properties

247

asymptotic variance. In relation to this form of asymptotic normality we
consider two further asymptotic properties.
Definition 8
An estimator 6, with Var(6,)=O(1/n) is said to be asymptotically
unbiased if

/n0,-0) 20

This is automatically

asn>x.

satisfied in the case of an asymptotically

(12.34)

normal

estimator 6, for Var(0,) = V,(0) and E(6,)=6,. Thus, asymptotic normality
can be written in the form

./n(6, — 0) ~ NO, V(0)).


(12.35)

It must be emphasised that asymptotic unbiasedness is a stronger condition

than lim,.,,, E(@,)= 6; the former specifying the rate of convergence.

In relation to the variance of the asymptotic normal distribution we can

define the concept of asymptotic efficiency.
Definition 9

An asymptotically normal estimator 0, is said to be asymptotically

efficient if V(€)=1,,(0)1, where

1„(Ø= lim (, 140)

(12.36)

H— œ.

ie. the asymptotic variance achieves the limit of the Cramer—Rao
lower bound (see Rothenberg (1973)).
At this stage it is important to distinguish between three different forms of
the information matrix. The sample information matrix /,(6) (see (13)), the
single observation one I(@} with f(x; 6) in (13), Le.

Fae)

dlog f(xs


M\*

and the asymptotic information matrix J,,(8) in (36).

12.3

Predictors and their properties

Consider the simple statistical model:

(a)
(il)

Probability model:

ie. X~N(O, 1).

= { f(x; 0)=1/,/(2n) exp{—3(xS—0)?, 0e)

Sampling model: X=(X,, X,,..., X,) is a random sample.


248

Properties of estimators

Hence the distribution of the sample is

(12437)


ƒ¡.....x„:Ø)= [| 6z 9).
(¿=1

Prediction of the value of X beyond the sample observations, say X„„¡,
refers to the construction ofa Borel function Í(-) from the parameter space
© to the observation space #

-): O2

(12.38)

If Ø is known we can use the assumption that X„,;¡ ~ W(Œ, l) to make
probabilistic statements about X,,,,. Otherwise we need to estimate 6 first
and then use it to construct /(-). In the present example we know from
Sections 12.1 and 12.2 above that

-

lẻ

(12.39)

0,=- ¥ X;
Hị=n

is a ‘good’ estimator of @. Intuition suggests that a ‘good’ predictor of X,,.,

might be to use /(6,)=6,, that is,


X41 =9,.
The random

value

the

(12.40)

variable X,,..,=1(0,) is called the predictor of X,.,

prediction

value.

Note

that

the

main

difference

and its

between

estimation and prediction is that in the latter case what we are ‘estimating’

(X,,+1) 18 a random variable itself not a constant parameter @.
In order to consider the optimal properties ofa predictor X, ,, =/(6,) we
define the prediction error to be
ye

=Xyey—Xg at

(12.41)

Given that both X,,, and X,,, , are random variables e, , , isalso a random
variable and has its own distribution. Using the expectation operator with
respect to the distribution of e,, we can define the following properties:

(1)

Unbiasedness. The predictor X,,. , of X,,., is said to be unbiased if
E(e,41) =0.

(2)

(12.42)

Minimum MSE. The predictor X,,,, of X,, is said to be minimum
mean square error if

E(2.¡)S E(X„.ị =„v¡)°SEX,.¡—Ẩ„.¡)2” —
for any other predictor X,,, of X, 41.

(1243)


Another property of predictors commonly used in practice is linearity.

(3)

Linear. The predictor X,,,, of X,,., is said to be linear if I() is a
linear function of the sample.


12.3.

Predictors and their properties

249

In the case of the example considered above we can deduce that

1
eat ~A[0 1+)

(12.44)

given that e,., is a linear function of normally distributed r.v.’s, e,.) =

X„.¡—(1/n)Š;-, X,. Hence, X,,,, is both linear and unbiased. Moreover,
using the same procedure as in Section 13.1 for linear least-squares
estimators, we can show that X,,., is also minimum MSE among the class
of linear unbiased predictors.
The above properties of predictors are directly related to the same
properties for estimators discussed in Section 12.1. This is not surprising,
however, given that a predictor can be viewed as an ‘estimator’ of a random


variable which does not belong to the sample.

Important concepts
Estimator, estimate, unbiased estimator, bias, relative efficiency, mean
square error, full efficiency, Cramer—Rao lower bound, information matrix,

sufficient

statistic,

finite

sample

properties,

consistency,
strong consistency,
asymptotic
unbiasedness, asymptotic efficiency, BLUE.

asymptotic
normality,

properties,

asymptotic

Questions

1.

2.
3.

4.
5.
6.
7.
8.
9.

Define the concept of an estimator as a mapping and contrast it with
the concept of an estimate.

Define the finite sample properties of unbiasedness, relative and full

efficiency, sufficiency and explain their meaning.
‘Underlying every expectation operator E(-) there is an implicit

distribution.’ Explain.
Explain the Cramer-Rao

lower bound

information matrix.
Explain the Lehmann-Scheffe

sufficient statistics.


method

and

the concept

of constructing

of the
minimal

Contrast unbiasedness and efficiency with sufficiency.
Explain the difference between small sample and asymptotic
properties.
Define and compare consistency and strong consistency.
Discuss the concept of asymptotic normality and its relationship to

the order of magnitude of Var(,).


250

Properties of estimators

10.

H1.

Explain the concept of asymptotic efficiency in relation to
asymptotically normal

estimators. ‘What
happens
when
the
asymptotic distribution is not normal”
Explain intuitively why \/n(@,—@)>0 as n> x is a stronger

12.

Explain the relationships between /„(Ø), (0) and I„(6).

condition than lim,..,, E(6,)=0.

Exercises

1.

Let X=(X,, X2,..., X,)' be a random
consider the following estimators of 0:
ˆ
0=X,,

1v
Ø;= rà

3_
1
U,=-;
AW


(i)

(il)

from

N(@, 1) and

l,2,....,n—1,

Lực

2
_1
2
0;=3X,+$X

sample

Ys

IX,

5 _ 64.6
0,=,+0;

j=]

Derive the distribution of these estimators.


Using these distributions consider the question whether these

estimators satisfy the properties of unbiasedness, full efficiency
and consistency.
(iii)
Choose the relatively most efficient estimator.
Consider the following estimator defined by

0="

n

ˆ

0,=n

and

Pr(

=) Jase

and

Pr,=n)=

nf

ˆ


n+l

1

7

and show that:
(1)
6, as defined above has a proper sampling distribution;

(ii)

(1)
(iv)

G,, is a biased estimator of zero;

lim, ,„ Var(ổ,) does not exist; and
6,, is a consistent estimator of zero.

Let X=(X,, X2,..., X,)' be a random sample from N(0,o7) and

consider

on

yx
i=]

as an estimator of o?.

(i)
Derive the sampling distribution of ở? and show that it is an
unbiased, consistent and fully efficient estimator of o?.



×