Tải bản đầy đủ (.pdf) (19 trang)

Chapter 11: STASTISTICAL INFERENCE

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (677.74 KB, 19 trang )

PART

III

Statistical inference


CHAPTER

The

11.1

II

nature of statistical inference

Introduction

In the discussion of descriptive statistics in Part 1 it was argued that in order
to be able to go beyond the mere summarisation and description of the
observed data under consideration it was important to develop a
mathematical model purporting to provide a generalised description of the
data generating process (DGP). Motivated by the various results on
frequency curves, a probability model in the form of the parametric family
of density functions đ= { f(x; 0), @Â â} and its various ramifications was
formulated in Part II, providing such a mathematical model. Along with the
formulation of the probability model ® various concepts and results were
discussed in order to enable us to extend and analyse the model, preparing
the way for statistical inference to be considered in the sequel. Before we go
on to consider that, however, it is important to understand the difference


between the descriptive study of data and statistical inference. As suggested
above, the concept of a density function in terms of which the probability
model is defined was motivated by the concept of a frequency curve. It is
obvious that any density function f(x; 6) can be used as a frequency curve
by reinterpreting it as a non-stochastic function of the observed data. This
precludes any suggestions that the main difference between the descriptive
study of data and statistical inference proper lies with the use of density
functions in describing the observed data. ‘What is the main difference
then”
In descriptive statistics the aim is to summarise and describe the data
under consideration and frequency curves provide us with a convenient
way to do that. The choice ofa frequency curve is entirely based on the data
in hand. On the other hand, in statistical inference a probability model © is
213


214

The nature of statistical inference

postulated a priori as a generalised description of the underlying DGP
giving rise to the observed data (not the observed data themselves). Indeed,
there 1s nothing stochastic about a set of members making up the data. The
stochastic element is introduced into the framework in the form of
uncertainty relating to the underlying DGP and the observed data are

viewed as one of the many possible realisations. In descriptive statistics we

start with the observed data and seek a frequency curve which describes
these data as closely as possible. In statistical inference we postulate a


probability model ® a priori, which purports to describe either the DGP

giving rise to the data or the population which the observed data came
from. These constitute fundamental departures from descriptive statistics
allowing us to make generalisations beyond the numbers in hand. This
being the case the analysis of observed data in statistical inference proper
will take a very different form as compared with descriptive statistics briefly

considered in Part I. In order to see this let us return to the income data

discussed in Chapter 2. There we considered the summarisation and
description of personal income data on 23 000 households using descriptors
like the mean, median, mode, variance, the histogram and the frequency
curve. These enabled us to get some idea about the distribution of incomes
among these households. The discussion ended with us speculating about
the possibility of finding an appropriate frequency curve which depends on

few parameters enabling us to describe the data and analyse them in a much
more convenient way. In Section 4.3 we suggested that the parametric
family of density functions of the Pareto distribution

o=| fs nL ()
0

6+1

1x0, der.)

(Lt)


could provide a reasonable probability model for incomes over £4500. As
can be seen, there is only one unknown parameter @ which once specified
(x; 8) is completely determined. In the context of statistical inference we
postulate ® a priori as a stochastic model not of the data in hand but of the
distribution of income of the population from which the observed data
constitute one realisation, i.e. the UK households. Clearly, there is nothing
wrong with using f(x; @) as a frequency curve in the context of descriptive
statistics by returning to the histogram of these data and after plotting
F(x; 6) for various values of 8, say = 1, 1.5, 2, choose the one which comes
closer to the frequency polygon. For the sake of the argument let us assume
that the curve chosen is @= 1.5, Le.

1.5 (4500
Fx) “ sng ( =)

2.5

= 452 804 x7 25,

(11.2)


11.2

The sampling model

215

This provides us with a very convenient descriptor of these data as can be

easily seen when compared with the cumbersome histogram function

Me

FRO) =

t

$i

(11.3)

KL: X41)

+Ö(X;+¡ —X;)

(see Chapter 2). But it is no more than a convenient descriptor of the data in
hand. For example, we cannot make any statements about the distribution
of personal income in the UK on the basis of the frequency curve f*(x). In
order to do that we need to consider the problem in the context of statistical
inference proper. By postulating ® above as a probability model for the
distribution of income in the UK and interpreting the observed data asa
sample from the population under study we could go on to consider
questions about the unknown parameter 6 as well as further observations

from the probability model, see Section 11.4 below.
In Section 11.2 the important concept of a sampling model is introduced

as a way to link the probability model postulated, say ®= { f(x; 6), 8 c@),
to the observed data x=(x;,...,x„Y available. The sampling model

provides the second important ingredient needed to define a statistical
model; the starting point of any ‘parametric’ statistical inference.
In Section 11.3, armed with the concept ofa statistical model, we go on to
discuss a particular approach to statistical inference, known as the
frequency approach. The frequency approach is briefly contrasted with
another important approach to statistical inference, the Bayesian.
A brief overview of statistical inference is considered in Section 11.4 asa
prelude to the discussion of the next three chapters. The most important

concept in statistical inference is the concept of a statistic which is discussed
in Section 11.5. This concept and its distribution provide the cornerstone
for estimation, testing and prediction.

11.2

The sampling model

As argued above, the probability model đ= { f(x; 6), @Â O} constitutes a
very

important

component

of statistical

inference.

Another


important

element in the same context is what we call a sampling model, which
provides the link between the probability model and the observed data. It is

designed to model the relationship between them and refers to the way the
observed data can be viewed in relation to ®. In order to be able to
formulate sampling models we need to define formally the concept of a

sample in statistical inference.


216

The nature of statistical inference

Definition 1
A sample is defined to be a set of random variables (r.v.s) (Xị, X 5.
., X,) whose density functions coincide with the ‘true’ density
function f(x; 9) as postulated by the probability model.
Note that the term sample has a very precise meaning in this context and it
is not the meaning attributed in everyday language. In particular the term
does not refer to any observed data as the everyday use of the term might

suggest.

The significance of the concept becomes apparent when we learn that the
observed data in this context are considered to be one of the many
possible realisations of the sample. In this interpretation lies the inductive
argument of statistical inference which enables us to extend the results

based on the observed data in hand to the underlying mechanism giving rise
to them. Hence the observed data in this context are no longer just a set of
numbers we want to make some sense of, they represent a particular
outcome of an experiment; the experiment as defined by the sampling
model postulated to complement the probability model đ=
{ f(x; 6).

0câ).

Given that a sample is a set of r.v.s related to ® it must
distribution which we call the distribution of the sample.

have

a

Definition 2
The distribution of the sample X =(X,.... . X,,)' is defined to be the
joint distribution of the rv/s X,...., X,, denoted by

ƒ(Xị..... X„; Ø) Sƒf(X: Ø).
The distribution of the sample incorporates both forms of relevant
information, the probability as well as sample information. It must come as
no surprise to learn that f(x; 0) plays a very important role in statistical
inference. The form of f(x; 6) depends crucially on the nature of the
sampling model as well as ®. The simplest but most widely used form ofa
sampling model ts the one based on the idea ofa random experiment & (see
Chapter 3) and is called a random sample.
Definition 3
A set of random variables (X,, X,.... X,) is called a random

sample from f(x; 0) if the rvs X,.X>,...,X,, are independent and
identically distributed (IID). In this case the distribution of the


11.2

The sampling model

217

sample takes the form
f(Xi.X;..... X„; Ổ)= []

i=]

ƒ(x;:Ø)=[ ƒf(x: 0)]J".

the first equality due to independence and the second to the fact that
the rvs ure identically distributed.
One of the important ingredients of a random experiment 4 is that the
experiment can be repeated under identical conditions. This enables us to
construct a random sample by repeating the experiment n times. Such a
procedure of constructing a random sample might suggest that this is
feasible only when experimentation ts possible. Although there is some
truth in this presupposition, the concept ofa random sample is also used in
cases where the experiment can be repeated under identical conditions, if
only conceptually. In order to see this let us consider the personal income
example where ® represents a Pareto family of density functions. ‘What isa
random sample in this case” If we can ensure that every household in the
UK has the same chance of being selected in one performance of a

conceptual experiment then we can interpret the n households selected as
representing a random sample (X,, X>,..., X,,) and their incomes (the
observed data) as being a realisation of the sample. In general we denote the
sample by X=(X,....,X,,)' and its realisation by x =(x,....,x,)', where x
is assumed

+

=R.

to take

values

in the observation

space

2% i.e. x 6.4;

usually

A less restrictive form ofa sampling model is what we call an independent
sample, where the identically distributed condition in the random sample is
relaxed.

Definition 4
A set of ruos(X,,...,X,,) is suid to be an independent sample from
f(x 0), f= 1, 2, ..., n, respectively. if the rves X,..... X, are
independent. In this case the distribution of the sample takes the

form

AXXO = PY] £058).

(11.4)

i=l

Usually the density functions ƒ(x,:Ø,), ¡=1,2..... n belong to the same
family but their numerical characteristics (moments, etc.) may differ.
If we relax the independence assumption as well we have what we can call

a non-random sample.


218

The nature of statistical inference
Definition

Š

A set of rcs(X,,...,X,) is said to be a non-random sample from
ƒ(Xi,..... x„: Ö) jƒ the rv’s X,,...,X, are non-HD. In this case the
only decomposition of the distribution of the sample possible is
ƒ(X¡. Xz...., Xu; Ổ)= II ƒ(Xj/Xị....v Xi~i:Ô),
i=
given Xo, where f(x;/X,,...,X;~1/9,),

i=1,2,...,n,


(11.5)

represent

conditional distribution of X, given X,, Xy,..., X;-1.

the

A non-random sample is clearly the most general of the sampling models

considered above and includes the independent and random samples as
special cases given that
ƒ(X//Xị.-

. „X;~1;Ø,)=ƒ(X,;8,),

i=

1,2,.



(11.6)

when X,,...,X,, are independent r.v.’s. Its generality, however, renders the
concept non-operational unless certain restrictions are imposed on the

heterogeneity and dependence among the X;s. Such restrictions have been


extensively discussed in Sections 8.2~—3. In Part IV the restrictions often
used are stationarity and asymptotic independence.
In the context of statistical inference we need to postulate both a
probability as well as a sampling model and thus we define a statistical
model as comprising both.

Definition 6
A statistical model is defined as comprising

(i)

(ii)

a probability model đ= { f(x; 0), 8 â}; and
a sampling model X=(X,,

X3,..., X,)'-

The concept of a statistical model provides the starting point of all forms of
statistical inference to be considered in the sequel. To be more precise, the
concept of a statistical model forms the basis of what is known as

parametric inference. There is also a branch of statistical inference known

as non-parametric inference where no ® is assumed a priori (see Gibbons
(1971)). Non-parametric statistical inference is beyond the scope of this
book.
It must be emphasised at the outset that the two important components

of a statistical model, the probability and sampling models, are clearly

interrelated. For example, we cannot postulate the probability model ®=


11.3

The frequency approach

219

{ f(x; 0), 0€@} if the sample X is non-random. This is because if the r.v.’s
X,,..., X, are not independent the probability model must be defined in
terms of their joint distribution, i.e.
= { f(x,,...,x,3 0), 8€O}. Moreover,
in the case of an independent but not identically distributed sample we need
to specify the individual density functions for each r.v. in the sample, i.e. =

{ f(x, 9), 06 O, K=1, 2,..., n}. The most important implication of this

relationship is that when the sampling model postulated is found to be
inappropriate it means that the probability model has to be respecified as
well. Several examples of this are encountered in Chapters 21 to 23.

11.3

The frequency approach

In developing the concept of a probability model in Part II it was argued
that no interpretation of probability was needed. The whole structure was
built upon the axiomatic approach which defined probability as a set
function P(-): #— [0,1] satisfying various axioms and devoid of any

interpretations (see Section 3.2). In statistical inference, however, the
interpretation of the notion of probability is indispensable. The discerning
reader would have noted that in the above introductory discussion we have

already adopted a particular attitude towards the meaning of probability.
In interpreting the observed data as one of many possible realisations of the

DGP as represented by the probability model we have committed ourselves
towards the frequency interpretation of probability. This is because we
implicitly assumed that if we were to repeat the experiment under identical
conditions indefinitely (ic. with the number of observations going to
infinity) we would be able to reconstruct the probability model ®. In the
case of the income example discussed above, this amounts to assuming that
if we were to observe everybody’s income and plot the relative frequency
curve for incomes over £4500 we would get a Pareto density function. This
suggests that the frequency approach to statistical inference can be viewed
as a natural extension of the descriptive study of data with the introduction
of the concept of a probability model. In practice we never have an infinity

of observations in order to recover the probability model completely and
hence caution should be exercised in interpreting the results of the
frequency-approach-based

statistical methods which we consider in the

sequel. These results depend crucially on the probability model which we
interpret as referring to a situation where we keep on repeating the
experiment to infinity. This suggests that the results should be interpreted

as holding under the same circumstances, i.e. ‘in the long run’ or ‘on

average’. Adopting such an interpretation implies that we should propose
statistical procedures which give rise to ‘optimum results’ according to


220

The nature of statistical inference
Probability model
®= {f(x;6),6 € Of

Distribution of the sample
F(x4,Xa,..-,X,/6)

X=

Sampling model
(X,,X2,..., X,)

Observed data
X=

(x1, X2,.

++

Xp)

Fig. 11.1. The frequentist approach to statistical inference.

criteria related to this ‘long-run’ interpretation. Hence, it is important to

keep this in mind when reading the following chapters on criteria for
optimal estimators, tests and perdictors.
The various approaches to statistical inference based on alternative
interpretations of the notion of probability differ mainly in relation to what
constitutes relevant information for statistical inference and how it should be
processed. In the case of the frequency approach (sometimes called the
classical approach) the relevant information comes in the form of a
probability model ®= { f(x; 0), Øc@} and a sampling model X=(X;, X;,
., X,,)', providing the link between ® and the observed data x =(x,, X>,
...,X,)'. The observed data are in effect interpreted as a realisation of the
sampling model, i.e. X =x. This relevant information is then processed via
the distribution of the sample f(x,,.x2,..., X43 8) (see Fig. 11.1).
The ‘subjective’ interpretation of probability, on the other hand, leads to
a different approach to statistical inference. This is commonly known as the
Bayesian approach because the discussion is based on revising prior beliefs
about the unknown parameters @ in the light of the observed data using
Bayes’ formula. The prior information about @ comes in the form of a
probability distribution (6); that is, @is assumed to be a random variable.
The revision to the prior f(6) comes in the form of the posterior distribution
ƒ({Ø/x) via Bayes’ formula:

/46X=

ƒ(x/0) /()
f
ng
+ /tx/Ø)/(0).

(11.7)


f(x/0) being the distribution of the sample and f(x) being constant for


11.4

An overview of statistical inference

221

X=x. For more details and an excellent discussion of the frequency and
Bayesian approaches to statistical inference see Barnett (1973). In what
follows we concentrate exclusively on the frequency approach.

11.4

An overview of statistical inference

As defined above the simplest form of a statistical model comprises:

(i)

(ii)

a probability model ®= { f(x; 6), @€ @}; and

a sampling model X=(X;, X;...., X„} — a random sample.
Using this simple statistical model, let us attempt a brief overview of
statistical inference before we consider the various topics individually in
order to keep the discussion which follows in perspective. The statistical
model in conjunction with the observed data enable us to consider the

following questions:
(1)
Are the observed data consistent with the postulated statistical
model? (misspecification)
(2)
Assuming that the statistical model postulated is consistent with
the observed data, what can we infer about the unknown
parameters 0€ ©?
(a)
Can we decrease the uncertainty about @ by reducing the
parameter space from © to ©, where @, is a subset of ©?
(confidence estimation)
(b)
Can we decrease the uncertainty about @ by choosing a
particular value in ©, say 6, as providing the most
representative value of 0? (point estimation)
(c)
Can we consider the question that @ belongs to some
subset O, of ©? (hypothesis testing)
(3)
Assuming that a particular representative value @ of @ has been
chosen what can we infer about further observations from the
DGP as described by the postulated statistical model? (prediction)
The above questions describe the main areas of statistical inference.
Comparing these questions with the ones we could ask in the context of
descriptive statistics we can easily appreciate the role of probability theory
in statistical inference.
The second question posed above (the first question is considered in the
appendix below) assumes that the statistical model postulated is ‘valid’ and
considers various forms of inference relating to the unknown parameters Ø.

Point

estimation

(or just

estimation):

refers

to

our

attempt

to give

numerical value to 6. This entails constructing a mapping h(): #—©
Fig.

11.2). We

a

(see

call function h(X) an estimator of @ and its value h(x) an



222

The nature of statistical inference
#

h{*)

Fig. 11.2. Point estimation.

Fig. 11.3.

Interval estimation.

estimate of 8. Chapters 12 and 13 on point estimation deal with the issues of
defining and constructing ‘optimal’ estimators, respectively.
Confidence estimation: refers to the construction of a numerical region for 6,

in the form of a subset ©, of © (see Fig. 11.3). Again, confidence estimation
comes in the form of a multivalued function (one-to-many) g(-): 2 > ©.

Hypothesis testing, on the other hand, relates to some a priori statement

about @
H,0¢0,
situation
as ‘valid’

of the form Hy: Ø0e©ạ, ©, or, equivalently, 0€O,, O,7O,=@
like this we need to devise a rule which

or reject H, as ‘invalid’ in view of the

some opposite statement
and O,v@,=O. In a
tells us when to accept H,
observed data. Using the

postulated partition of @ into ©, and @, we need, in some sense, to
construct a mapping g(-):%—@ whose inverse image induces the partition

q” *‘(Qo)=Co — acceptance region,
q_'(@,)=C, — rejection region,
where Cy UC, =

(see Fig. 11.4).


11.5

Statistics and their distributions
#

©

J

Fig. 11.4.

223


Hypothesis testing.

The decision to accept Ho as a valid hypothesis about @ or reject
H, as an invalid hypothesis about @, in view of the observed data, will be

based on whether the observed data X belongs to the acceptance or

rejection regions respectively, i.e. XEC,) or XEC, (see Chapter 14).
Hypothesis testing can also be used to consider the question of the
appropriateness of the probability model postulated. Apart from the direct
test based on the empirical cumulative distribution function (see Appendix
11.1) we can use indirect tests based on characterisation theorems. For
example, if a particular parametric family is characterised by the form of its
first three moments, then we can construct a test based on these. For several

characterisation results related to the normal distribution see Mathai and

Pederzoli (1977). Similarly, hypothesis testing can be used to assess the
appropriateness of the sampling model as well (see Chapter 22).
As far as question 3 is concerned we need to construct a mapping (-):

â Ơ which will provide us with further values of X not belonging to the
sample X, for a given value of Ø.

11.5

Statistics and their distributions

As can be seen from the bird’s-eye view of statistical inference considered in
the previous section, the problem is essentially one of constructing some

mapping of the form:

qe) £0

(11.8)

or its inverse, which satisfies certain criteria (restrictions) depending on the
nature of the problem. Because of their importance in what follows such

mappings will be given
‘statistics’.

a very

special

name,

we call them

(sample)


224

The nature of statistical inference

Definition 7
A Statistic is said to be any Borel function (see Chapter 6)
qs): £2


R,

Note that q(-) does not depend on any unknown

parameters.

Estimators, confidence intervals, rejection regions and predictors are all
statistics which are directly related to the distribution of the sample.
‘Statistics’ are themselves random variables (r.v.’s) being Borel functions of
rv.s or random vectors and they have their own distributions. The
discussion of criteria for optimum ‘statistics’ is largely in terms of their
distributions.
Two important examples of statistics which we will encounter on
numerous occasions in what follows are:
_

X,=and

1

H

}

X¿

Niet

5


1

called the sample mean,

(11.9)

H

S°=-- ¬
(X,—X,)?;
H—
ii

called the sample variance.

(11.10)

On the other hand, the functions
h(X)

1

H

=~ n

2

xX.


=
o

(

11.11

)

and

l\X)=-

1

n

—; Ÿ(X;—n)?
na-1it

(11.12)

are not statistics unless o* and ¿: are known, respectively.
The concept of a statistic can be generalised to a vector-valued function
of the form
qt): 77 O=R™

mel.


(11.13)

As with any random variable, any discussion relating to the nature of
q(X) must be in terms of its distribution. Hence, it must come as no surprise

to learn that statistical inference to a considerable extent depends critically

on our ability to determine the distribution ofa statistic g(X) from that of
X=(X,,X ,...,X,)', and determining such distributions is one of the most
difficult problems in probability theory as Chapter 6 clearly exemplified. In
that chapter we discussed various ways to derive the distribution function
of ¥ =q(X)

F(y)=Prig(X)
(11.14)


11.5

Statistics and their distributions

225

when the distribution of X is known and several results have been derived.
The reader can now appreciate the reason he/she had to put up with some
rather involved examples. All the results derived in that chapter will form

the backbone of the discussion that follows. The discerning reader must


have noted that most of these results are related to simple functions q(X)
of normally distributed r.v.’s, X,, X2,..., X,. It turns out that most of
the results in this area are related to this simple case. Because of the
importance of the normal distribution, however, these results can take us a

long way down ‘statistical inference avenue’. Let us restate some of these
results in terms of the statistics X, and s? for reference purposes.
Example 1
Consider the following statistical model:

(i)
(ii)

1

o=| f=;

—1fx—-p\*

On xp} 5

X=(X,,X>,...,X,)'

("=")

| o=ucrerx

ah

is a random sample from f(x; 0).


For the statistics

X=. ĐX,
_

H

n=1

and

the following distributional results hold:
2

(i)
(ii)

(11.15)

X,~ vỘn „}
VE!
/

V

NO, 1);




(iii)

see

nl ———

X;—ụ

?

(iv)

(n— 1)

(11.16)

~x£);

(11.17)

~z?{n— 1),

(11.18)

2

Note that
"

i=l


(A) -|n(*


Gg

2

y



o

°) +n 958 Jey
2

v2

g

(11.19)


226

and

(v)


The nature of statistical inference

Cov(X,,.
7) =0.

“..

tín — l),

(1120)

Sn

(vi)

(s3/07)
(2/2)

~ F(n—1,m—1),
(n—1,m—1)

( 11.21 )

where 72 is the corresponding sample variance of a random sample (Z,, Z >,

...5Zm) from N(u;,+?) and s¿, t2 are independent.

All these results follow from Lemmas 6.1-6.4 of Section 6.3 where the
normal, chi-square, Student’s t and F distributions are related.
Using the distribution of g(X), when known, as in the above cases, we can

consider questions relating to the nature of this statistic such as whether it
provides a ‘good’ (to be defined in the sequel) or a ‘bad’ estimator or test

statistic. Once this is decided we can go on to make probabilistic statements

about 0, the ‘true’ parameter of ®, which is what statistical inference is

largely about. The question which naturally arises at this point is: “What

happens if we cannot determine the distribution of the statistic q(X)?
Obviously, without a distribution for g(X) no statistical inference is possible

and thus it is imperative to ‘solve’ the problem of the distribution somehow.

In such cases asymptotic

theory developed

in Chapters

9 and

10

comes to our rescue by offering us ‘second best’ solutions in the form of
approximations

to the distribution

of g(X). In Chapter


6 we discussed

various results related to the asymptotic behaviour of the statistic X,, such
as:

(i)

(i)

as.

X17

P

X,on;

and

đi) — vm càng ¬Š Z~N(0,1);

(1122)

irrespective of the original distribution of the X;s. Given only that E(X)=y,
Var(X,;)=07< x; note that E(X,)=y. In Chapter 10 these results were
extended to more general functions h(X). In particular to continuous
functions of the sample raw moments

m=


1

n

YX,
i=1

red

(11.23)


11.5

Statistics and their distributions

227

In relation to m, it was shown that in the case of a random sample:
as.

(i)

Mm,

Lys

(ii)


m, i,

(iii)

vat)

P

ơ

and

â 7 Nea, 1);

(11.24)

oO,

for

„=|

x'f(x)dx with E(m)=,, ø?=uz—(/)3⁄, r>1,
(11.25)

assuming that w„< %.
It turns out that in practice the statistics g(X) of interest are often

functions of these sample moments. Examples of such continuous functions


of the sample raw moments are the sample central moments being defined by

H=- V(X

X,Y,

r>1.

(11.26)

These provide us with a direct extension of the sample variance and they

represent the sample equivalents to the central moments

„=|

(x—7(x) dx.

(11.27)

With the help of asymptotic theory we could generalise the above
asymptotic results related to m,, r>1, to those of Y, =q(X) where q(-) is a

Borel

function.

conditions

For


example,

we

could

show

that

under

the

same

a.S.

()
.

By > Hy
P

(11)

Hy

(iii)


v0), —u)
— `7 N0, 1;

(11.28)

OF?= Mạy+¿T— HZ+¡ — 2+ 1)M,H,.
; + (F + 1)®HH;,

(11.29)

where

HS

assuming that u;„< œ; see exercise Í.


228

The nature of statistical inference

Asymptotic results related to Y, =q(X) can be used when the distribution of
Y, is not available (or very difficult to use). Although there are many ways to
obtain asymptotic results in particular cases it is often natural to proceed by
following the pattern suggested by the limit theorems in Chapter 9:
Step |
Under certain conditions Y, = q(X) can be shown to converge in probability
to some


function of h(6) of @, ice.
P

Y,+h(0),

as.

or

Y,—h(6).

(11.30)

Step 2

Construct two sequences {h,(6), c,(0), n= 1} such that
Y*=

¥:



8

¥y— fol )

c„(0)

D


—= Z~N(0,

l).

(11.31)

Let F,,(y*) denote the asymptotic distribution of Y*,

then for large n

(11.32)

FQ)= F.(9*),

and F,.(y*) can be used as the basis of any inference relating to Y,=q(X). A
question which naturally comes to mind is how large n should be to justify
the use of these results. Commonly no answer is available because the
answer would involve the derivation of F,,() whose unavailability was the
very reason we had to resort to asymptotic theory. In certain cases higherorder approximations based on asymptotic expansions can throw some
light on this question (see Chapter 10). In general, caution should be
exercised when asymptotic results are used for relatively small values of n,
say n< 100?
Appendix

11.1 — The empirical distribution function

The first question posed in Section 11.4 relates to the validity of the
probability and sampling models postulated. One way to consider the
validity of the probability model postulated is via the empirical distribution
function F*(x) defined by

1
F?(x)=- (number of x,sH
Alternatively, if we define the random

xeéR.
variable (r.v.) Z; to be


Appendix 11.1
Z.=
1

229

1

ifx,e(—x,x],

0

otherwise,

xeR,

then F*(x)=(1/n) »= Z;. If the original distribution postulated in ® is
F(x), a reasonable thing to do is to compare it with F*(x). For example,
consider the distance

D,,=max |F*(x) — F(x)|,

xeR

D,, as defined is a mapping of the form D,(-): #— [0,1] where 2 is the
observation space. Given that Z; has a Bernoulli distribution F*(x) being
the sum of Z,, Z,,..., Z,, is binomially distributed, 1.e.

pr( Favs)" )=(; JOR! —F(x" *],

k=0,l,....n,

where E(F*(x)}=F(x) and Var(F*(x))=(1/n)F(x)[1 —F(x)].
central limit theorem (see Section 9.3) we can show that

vn

F¥(x)—

FO

Using

the

D

ne) FO) + Z~N(, 1).

tF(x)L1 —

F(x)])


;

D

Using this asymptotic result it can be shown that \/n D,> y where

ru)

—2 ¥ (- 1" ' expt 242523 |
k-1

yeR,.

This asymptotic distribution of J nD,, can be used to test the validity of ®;
see Section 21.2.
Important concepts
Sample, the distribution of the sample, sampling model, random sample,
independent sample, non-random sample, observation space, statistical
model, empirical distribution function, point estimation, confidence
estimation, hypothesis testing, a statistic, sample mean, sample variance,
sample raw moments, sample central moments, the distribution of a
statistic, the asymptotic distribution of a statistic.

Questions

1.

Discuss the difference between descriptive statistics and
inference.


statistical


230

The nature of statistical inference

Contrast f(x; 6) as a descriptor of observed data with f(x; 6) as a
member of a parametric family of density functions.
Explain the concept of a sampling model and discuss its relationship
to the probability model and the observed data.
Compare the sampling models:
(i)
random sample;
(it)
independent sample;
(m)
non-random sample;

and explain the form of the distribution of the sample in each case.
Explain the concept of the empirical distribution function.
‘Estimation and hypothesis testing is largely a matter of constructing
mappings of the form g(-): 2 — @. Discuss.

Explain why a

Statistic is a random variable.

Ensure that you understand the results (11.15}(11.21) (see Appendix


6.1).
‘Being able to derive the distribution of statistics of interest is largely
10.

what statistical inference is all about.’ Discuss.
Discuss the concept of a statistical model.
Exercises

1.* Using the results (22)-(29) show that for a random sample X from a
distribution whose first four moments exist,

eee(0

aoe)

Additional references
Barnett (1973); Bickel and Doksum

(1977); Cramer (1946); Dudewicz (1976).



×