Tải bản đầy đủ (.pdf) (191 trang)

S sampath sampling theory and methods 2001

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (3.78 MB, 191 trang )


Sampling Theory
and Methods

S. Sampatb

CRC Press
Boca Raton London

New York Washington, D.C.

Narosa Publishing House
New Delhi Chennai

Mumbai Calcutta


S. Sampath
Department of Statistics
Loyola College. ChennaJ-600 034. India

Library of Congress Cataloging-in-Publication Data:

A catalog record for this book is available from the Library of Congress.

All rights reserved. No part of this publication may be reproduced. stored
in a r~trieval system or transmitted in any form or by any means, electronic,
mechanical. photocopying. or otherwise, without the prior permission of the
copyright owner.


This book contains information obtained from authentic and highly regarded sources.
Reprinted material is quoted with permission, and sources are indicated. Reasonable
efforts have been made to publish reliable data and information. but the author and the
publisher cannot assume responsibility for the validity of all materials or for the
consequences of their use.

Neither this book nor any part may be reproduced or transmitted in any form or by any
means. electronic or mechanical, including photocopying, microfilming, and recording,
or by any information storage or retrieval system, without prior permission in writing
from the publisher.

Exclusive distribution in North America only by CRC Press LLC

Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton,
Florida 33431. E-mail: orders @crcpress.com

Copyright@ 2001 Narosa Publishing House, New Delhi-110 017, India

No claim to original U.S. Government works
International Standard Book Number 0-8493-0980-8
Printed in India.


Dedicated to my
parents


Preface
This book is an outcome of nearly two decades of my teaching experience both
at the gmduate and postgraduate level in Loyola College (Autonomous),

Chennai 600 034, during which I came across numerous books and research
articles on "Sample Surveys".
I have made an attempt to present the theoretical aspects of "Sample Surveys" in
a lucid fonn for the benefit of both undergraduate and post graduate students of
Statistics.
The first chapter of the book introduces to the reader basic concepts of Sampling
Theory which are essential to understand the later chapters. Some numerical
examples are also presented to help the readers to have clear understanding of
the concepts. Simple random sampling design is dealt with in detail in the
second chapter. Several solved examples which consider various competing
estimators for the population total are also included in the same chapter. The
third is devoted to systematic sampling schemes. Various systematic sampling
schemes like, linear, circular, balanced. modified systematic sampling and their
performances under different superpapulation models are alSo discussed. In the
fourth chapter several unequal probability sampling-estimating strategies are
presented. Probability Proportional to Size Sampling With and Without
Replacement are considered with appropriate estimators. In addition to them
Midzuno sampling scheme and Random group Method are also included.
presented with
Stratified sampling, allocation problems and related issues
full details in the fifth chapter. Many interesting solved problems are.also added.
In the sixth and seventh chapters the use of auxiliary information in ratio and
regression estimation are discussed. Results related to the properties of ratio and
regre~ion estimators under super-population models are also given. Cluster
sampling and Multistage sampling are presented in the eighth chapter. The
results presented in under two stage sampling are general in nature. In the ninth
chapter, non-sampling errors, randomised response techniques and related topics
are discussed. Some recent developments in Sainple surveys namely, Estimation
of distribution functions, Adaptive sampling schemes, Randomised response
methods for quantitative data are presented in the tenth chapter.


are

Many solved theoretical problems are incorporated into almost all the chapters
which will help the readers acquire necessary skills to solve problems of
theoretical nature on their own.
I am indebted to the authorities of Loyola College for providing me the
necessary faciliti~s to successfully complete this work. I also wish to thank
Dr.P.Chandrasekar. Department of Statistics, Loyola College, for his help during
proof correcti~n. I wish to place on record the excellent work done by the
Production Department of Narosa Publishing House in fonnatting the
1nanuscript

S.Sampath


Contents

Chapter 1
1.1
1.2
1.3

Chapter 2
2.1
2.2
2.3

Chapter3
3.1

3.2
3.3
3.4
3.5
3.6
3.7
3.8

Chapter4
4.1
4.2
4.3
4.4
4.5
4.6

ChapterS
5.1
5.2
5.3
5.4

Chapter6
6.1
6.2
6.3
6.4
6.5
6.6
6.7


Preliminaries
Basic Definitions
Estimation of Population Total
Problems and Solutions

1
3
8

Equal Probability Sampling
Simple Random Sampling
Estimation of Total
Problems and Solutions

10
11

16

Systematic Sampling Schemes
Introduction
Linear Systematic Sampling
Schemes for Populations with Linear Trend
Autocorrelated Populations
Estimation of Variance
Circular Systematic Sampling
Systematic Sampling in Two Dimensions
Problems and Solutions


29
29
34
39
42
43
44
47

Unequal Probability Sampling
PPSWR Sampling Method
PPSWOR Sampling Method
Random Group Method
Midzuno scheme
PPS Systematic Scheme
Problems and Solutions

55
60
63
67
70
71

Stratified Sampling
Introduction
Sample Size Allocation
Comparision with Other Schemes
Problems and Solutions


76
79
86
89

Use of Auxiliary Information
Introduction
Ratio Estimation
Unbiased Ratio Type Estimators
Almost Unbiased Ratio Estimators
Jackknife Ratio Estimator
Bound for Bias
Product Estimation

97
97
100
102
104
105
106


x

Contents

6.8
6.9
6.10

6.11

Chapter 7
7.1
12

7.3
7.4
7.5
7.6

Chapter 8
8.1
8.2
8.3

Chapter 9
9.1
9.2
9.3

Chapter 10
10.1
10.2
10.3

Two Phase Sampling
Use of Multi-auxiliary Information
Ratio Estimation in Stratified Sampling
Problems and Solutions


108
113
115
117

Regression Estimation
Introduction
Difference Estimation
Double Sampling in Difference Estimation
Multivariate Difference Estimator
Inference under Super Population Models
Problems and Solutions

122
124
125
126
129
137

Multistage Sampling
Introduction
Estimation under Cluster Sampling
Multistage Sampling

140
140
143


Non-sampling Errors
Incomplete Surveys
Randomised Response Methods
Observational Errors

152
158
161

Recent Developments
Adaptive Sampling
Estimation of Distribution Functions
Randomised Response Methods for
Quantitative Data

165
171
174

References

179

Index

183


Chapter 1


Preliminaries
1.1 Basic Definitions
Definition 1.1 "Finite Population" A finite population is nothing but a set
containing finite number of distinguishable elements.
The elements of a finite population will be entities possessing panicular
characteristics in which a sampler would be interested and they will be referred
to as population units. For example, in an agricultural study where one is
interested in finding the total yield, a collection of fields or a collection of plots
may be defined as population. In a socio-economic study, population units may
be defined as a group of individuals, streets or villages.
Definition 1.2 "Population Size" The number of elements in a finite population
is called population size. Usually it is denoted by Nand it is always a known
finite number.
With each unit in a population of size. N, a number from 1 through N is
assigned. These numbers are called labels of the units and they remain
unchanged throughout the study. The values of the population units with respect
to the characteristic y under study will be denoted by Y1 , Y2 , ... , YN. Here Y;
denotes the value of the unit bearing label i with respect to the variable y.
Defmition 1.3 "Parameter" Any real valued function of the population values
is called parameter.
1 N
1 N
For example, the population mean Y = S2 =
Y]2 and

L,r; , --IIli -

N i=l

N-1 i=l


population range R = Max {X; }- Min {X; } are parameters.

Definition 1.4 "Sample" A sample is nothing but a subset of the population S.
Usually it is denoted by s. The number of elements in a sample s is denoted
by n(s) and it is referred to as sample size.
Definition 1.5 "ProbabHity SampUng" Choosing a subset o( the population
according to a probability sampling design is called probability sampling.


2

Sampling Theory and Methods

Generally a sample is drawn to estimate the parameters whose values are
not known.

Definition 1.6 "Statistic" Any· real valued function is called statistic, if it
depends on Yt, Y2, .... YN only through s.
A statistic when used to estimate a parameter is referred to as estimator.

·Definition 1. 7 "Sampling Design" Let .Q be the collection of all subsets of S
and P(s) be a probability distribution defined on .Q. The probability distribution
{P(s),se .Q} is called sampling design.
A sampling design assigns probability of selecting a subset s as sample.
For example, let .Q be the collection of all (:] possible subsets of size n of the
populationS. The probability distribution
P(s)=

jCf

0

if n(s) = n
otherwise

is a sampling design. This design assigns probabilities

(Nl-1
II

for all subsets of

I

size n for being selected as sample and zero for all other subsets of S.
It is pertinent to note that the definition of sample as a subset of S does not
allow repetition. of units in the sample more than once. That is, the sample will
always contain distinct units. Alternatively one can also define a sequence
whose elements are members of S as a sample, in which case the sample will not
necessarily contain distinct units.

Definition 1.8 "Bias" Let PT(s) is unbiased for the parameter 8 with respect to the sampling design P(s) if

Ep[T(s)] =

L T(s)P(s) =8.

seD


The difference Ep[T(s)]-8 is called the bias of T(s) in estimating 8 with
respect to the design P(s). It is to be noted that an estimator which is unbiased
with respect to a sampling design P(s) is not necessarily unbiased with respect to
some other design Q(s).

Definition 1.9 "Mean Square Error" Mean square error of the estimator
T(s) in estimating 8 with respect to the design P(s) is defined as
MSE
= L[T(s) -8] 2 P(s)
seD


Preliminaries

3

If Ep[T(s)] =8 then the mean square error reduces to variance.
Given a parameter8. one. can propose a number of estimators. For
example, to estimate the population mean one can use either sample mean or
sample median or any other reasonable sample quantity. Hence one requires
some criteria to choose an estimator. In sample surveys, we use either the bias or
the mean square error or both of them to evaluate the performance of an
estimator. Since the bias gives the weighted average of the difference between
the estimator and parameter and the mean square error gives weighted squared
difference of the estimator and the parameter. it is always better to choose an
estimator which has smaller bias (if possible unbiased) and lesser mean square
error. The following theorem gives the relationship between the bias and mean
square error of an estimator.
~


Theorem 1.1 Under the sampling design P(s), any statistic T(s) satisfies the
"'

....

....

~

....

....

relation MSE( P : T) = V p (T) + [ B p (T)]- where Vp (T) and B p (T) are variance
and bias ofthe statistic T(s) under the sampling design P(s).
~

~

.,

Proof MSE(T: P) = E p[T(s) -8]-

= ~)f.~.Q

= ~)f(s)- E p (T(s)) + E p (T(s))- 8] 2 P(s)
.~.Q


= ~)fse.Q

Hence the proof. •
As mentioned earlier, the performance of an estimator is evaluated on the basis
of its bias and mean square error of the estimator. Another way to assess the
performance of a sampling design is the use of its entropy.
Definition 1.10 "Entropy" Entropy of the sampling design P(s) is defined as,
e =- LP(s)lnP(s)
.~.Q

Since the entropy is a measure of information corresponding to the given
sampling design, we prefer a sampling design having maximum entropy.

1.2 Estimation of Population Total
In order to introduce the most popular Horvitz-Thompson estimator for the
population total we give the following definitions.


4

Sampling Theory and Methods

Definition 1.11 "Inclusion indicators" Let s 3 i denote the event that the
sample s contains the unit i . The random variables
I if s 1 i. IS. iS. N
l·(s)= {
·
'
0 otherwise

are called inclusion indicators.
Definition 1.12 "Inclusion Probabilities" The first and second order inclusion
probabilities corresponding to the sampling design P(s) are defined as
TC;

=L P(s). rcij = L P(s)
ni.j

s3i

where the sum

L

extends over all s containing i and the sum

L

extends

ni.j

Hi

over all s containing both i and j.

Theorem 1.2 For any sampling design PCs). (a)£p[/i(s)]=rc;.i=l.2..... N
(b) E P [I; (s) I J (s)] = rc iJ , i, j

= l. 2..... N


Proof (a) Let .Q 1 be the collection of all subsets of S containing the unit with
label i and !22 = .Q -.Qt.
Ep[l;(s)]= L

l;(s)P(s)+ Ll;(s)P(s)

seD 1

= L

seD,

1P(s)+ L

.teD1

OP(s)

.fE.CJ,

= LP(s)
ni

= TC;
(b) Let .Q1 be the collection of all subsets of S containing the units with labels ;
and} and .Q 2 = .Q -!2 1.

Notethat. l;(s)I 1·(J)= {


1 ifse.Q 1
.

0 otherw1se

Therefore Ep[/;(s)lj(s)]= L

l;(s)J 1 (s)P(s)+ Ll;(s)J 1 (s)P(s)

seD 1

.fED2

= LP(s)
seD 1

= LP(s)
Hi.j

= ";J

Hence the proof. •
N

Theorem 1.3 For any sampling design P(s). E p[n(s)] =

L";
i=l



Preliminaries

5

Proof For any sampling design. we know that.
N

n(s) =

L I;

(s)

i=l

Taking expectation on both sides, we get
N

Ep[n(s)]= L,Ep[l;(s)]
i=l
N

= :Llr;
i=l

Hence the proof. •

Theorem 1.4 (a) For i =I. 2•...• N. V p[/; (s)] = lr; (1-Jr;).
(b) Fori. j =I, 2..... N.cov p[/; (s).l j (s)] = lr;j -Jr;lr j
Proof of this theorem is straight forward and hence left as an exercise.


Theorem 1.5 Under any sampling design. satisfying P[n(s) = n] =I for all s.
N

(a) n= ~Jr
·
~ 1

N

(b) £..
~[Jr·Jr·
-Jr IJ.. ]=Jr·(l-Jr·)
I
J
I
I

i=J

)~I

Proof (a) Proof of this part follows from Theorem 10.3
N

(b) Since for every s. P[n(s) = n] =I, we have

LI

j


(s) = n -I; (s)

Hence by Theorem 1.4, we write
Jr;(l-Jr;) =Vp[/;(s)]
=cov p[l;(s),l;(s)]
N

=cov p[l;(s),n-

L

I 1 (s)]

j~l

N

=-l:cov p[l;(s),l j\S)]
j~i

N

= I[lr;Jr j -Jrij]
Hence the proof. •
Using the first order inclusion probabilities, Horvitz and Thompson ( 1952)
constructed an unbiased estimator for the population total. Their estimator for
the population total is YHT = ~ Y; . The following theorem proves that the
£.. Jr·
..

ies

1


6

Sampling Theon and Methods

above estimator is unbiased for the population total and also it g1ves the
variance.

Theorem 1.6 The Horvitz-Thompson estimator

yHT

=

L .!!._ is unbiased for
IE.~

11
- - +2 LN LN Y-Y [rc -rc;rc i
LN Y· 2 [t-rc;]
1C
1
TC·TC.
I

1=J


I=J 1=J
l
I

1

I

1

l

the population total and its variance is
I

TC-

_.

Proof The estimator yHT can be written as
N

..
YHT

y.1 f;(s)
= ""'
£..Ji=l 1C;


Taking expectations on both sides we get
..

Ep[YHT]

N
~Y·

=£..J- TC; = y
1

i=l rc 1

Therefore yHT is unbiased for the population total.
Consider the difference
..

y.

N

YHT -Y =

=

L -~
i=l

~;


N

y.

2,1

N

l;(s)-

L

Y;

i=l

[/;

(s) -rc; J

i=l 1C;

Squaring both the sides and taking expectations we get

]2

N [ Y;
..
Ep[YHT

-Y] 2 = ""'
£..J
-_
i=I rc 1

Ep[/;(s)-rc;] 2
N

N

y.

yj

1 -Ep[/;(s)-rc;][lj(s)-rc )
+22,2,1

i=l j=l 1C j 1C j

i
y. yi
L -Y- 12J Vp[/;(s)]+2L,L,-covp(/;(s).J 1 (s))
N [

=

N

N


1

i=J

1

TC·

TC·TC·
i=J j=J 1
1

I

i
[1-rc;]

-_LN y. 2 - - +2 LN LN y.y.
I

i=J

1C.
I

I

i=J j=J

i
1

[rcij

-rc;rc 1 ]

1C ·TC .
I

1

Hence the proof. •

Remark The variance of Horvitz-Thompson estimator can also be expressed in
the follo~ing form


Preliminaries

LL ;i -,/y ]2
N

N [

i=J j=J
i
[lr;lr j -Jrtj


7

I

)

I

Proof for Remark From the previous theorem. we have
.,[1-Jr 1 ]
+ 2~ ~ Y; Yj
V p [Y~HT] =LN Yt -

f f

Jr·

i=J

=

i=J j=J
i
I

L[y2]
N


[Jrij -lr;lr j]
Jr·lr .
I

N

N

N

)

;2 L[lr;lrj -Jr;j]+ I~Y;Yj
1=1 j=J
Jrl j=l

i=J

[Jr .. -Jr·lr.]
J

I)Jr·Jrl
I

)

j~

j~


J

N N Y; yj
I N N [ Y/ r]
=- ~ ~ - + - [lr·lr. -Jr .. ]-~ ~--[Jr·Jr. -Jr .. ]
1)
j
I
~ ~ Jr. Jr
I)
)
I
2
2~~ . 2

1=J j=J

1=J j=J
j:;:.i

Jr j

Jrl

j:;:.l .

=z-II --I N N ( f;

YJ ]


lr;

i=l j=l
j:;:.i

N N [
L Y;
=L

)

2

[lr;lrj-Jrij]

1r j

y

_ _j_

i=l j=l 1r;
i
I

]2

[lr;lr j -Jrij 1


1r j

Hence the proof. •
The above form of the variance of the Horvitz-Thompson estimator is known as
Yates-Grundy form. It helps us to get an unbiased estimator of the variance of
Horvitz-Thompson Estimator very easily. Consider any design yielding positive
second order inclusion probabilities for all pairs of units in the population. For
any such design. an unbiased estimator of the variance given above is

I I[!L-~]2(
ies jes
i
1r

1

Jr·

1

lr;lr J -Jr;j
Jr 1..

J

1

The Horvitz-Thompson estimator. its variance and also the estimated variance
can be used for any sampling design yielding positive first order inclusion

probabilities for all the population units and positive second order inclusion
probabilities for all pairs of units. It is pertinent to note that the estimated
variance is likely to take negative values. A set of sufficient conditions for the
non-negativity of the estimated variance expression given above are. for every
pairofunitsiandj. Jr;Jrj -lr;j ~0 (i.j=1,2 ..... N).


8

Sampling Theory and Methods

Note For estimating population total. apart from Horvitz-Thompson estimator
several other estimators are also available in literature and some of them are
presented in later chapters at appropriate places.

1.3 Problems and Solutions
Problem 1.1 Show that a necessary and sufficient condition for unbiased
estimation of a finite population total is that the first order inclusion probabilities
must be positive for all units in the population.
Solution When the first order inclusion probabilities are positive for all units,
one can use Horvitz-Thompson estimator as an unbiased estimator for the
population total.
When the first order inclusion probability is zero for a particular unit say,
the unit with label i expected value of any statistic under the given sampling
design will be free from Y;, its value with respect to the variable under study.
Hence the first order inclusion probability must be positive for all units in the
population. •
Problem 1.2 Derive cov p ( Y. X) where Y and X are Hestimators of Y and X. totals of the population units with respect to variables y
and x respectively.

Solution For i =I. 2, ... , N. let Z; =X; + Y;.
Note that i = i + Y
....
....
....
....
....
Therefore V p[Z] = V p[Xl + V p[Y]+ 2cov p[X, Y]
By remark given under Theorem 1.6. we have

..

V p[Z]

(I. I)

N N [ z. z j ]:!
~~
= k.J L rc~ -~ [rc;rc J -rei}]
i=J j=J
i
J

I

N N [
X } ]:! (TC·TC · -TC··) + ~
N N [
y ]2 (TC·TC · -TC··)

=~
~ !..!_ __
~ !i_ __j_
k.JL TC· TC·
J
k.JL TC· TC·
J
I

i=J j=J

I)

I

J

I

i=J j=J

I

~j

kj

LNN[
L -y


+2

i=l j=l
i
1

1C;

y][. X]
i

--·

1C J

-

X·1

1C;

J

---

[!C;TC j -TC;j]

1C j


Comparing ( 1.1) and ( 1.2) we get

..

][X; J]

~~n~[ Y; YJ
X
cov p(X,Y) = LL -. --. - . - - . [rc;rc J -rciJ]
i=J j=J
i
Hence the solution. •

I)

J

1C I

1C J

1C I

1C J

( 1.2)


Preliminaries


9

Exercises
I .I

Let S be a tinite population contammg 5 units. Calculate all first and
second order inclusion probabilities under the sampling design
0.2 for s = {2.3,4}, s = {2.5}
P(s)= { 0.3 fors={l,3,5},s={l.4}
0

1.2

1.3

otherwise

From a population containing five units with values 4,7,11,17 and 23, four
units are drawn with the help of the design
0.20 if n(s) = 5
P(s) = {
0 otherwise
Compare the bias and mean square error of sample mean and median in
estimating the population mean.
List all possible values of n(s) under the sampling design given in Problem
N

1.1 and verify the relation E p[n(s)] =


L1r;
i=l

1.4 Check the validity of the statement "Under any sampling design the sum
of first order inclusion probabilities is always equal to sample size".
1.5 Check the validity of the statement "Horvitz-Thompson estimator can be
used under any sampling design to obtain an unbiased estimate for the
population total".


Chapter 2

Eq~al

Probability Sampling

2.1 Simple Random Sampling
This is one of the simplest and oldest methods of drawing a sample of size n
from a population containing N units. Let n be the collection of all 2N subsets of
S. The probability sampling design
P(sj=

(N)n

1

if n(s) = n

0
otherwise

is known as simple random sampling design.
In the above design each one of the

fNJ

lll

possible sets of size n is given

equal probability for being selected as sample. The above design can be
implemented by following the unit drawing procedure described below:
Choose n random numbers from 1 through N without replacement. Units
corresponding to the numbers selected as above will constitute the sample.
Now we shall prove this sampling mechanism implements the sampling
design defined above.
Consider an arbitrary subset s of the population S whose members are i" i 2, i 3,
... , i,.. The probability of selecting the units i1. i 2, i3, ... , i,. in the order i 1 -+i2
-+i3 -+ ... -+i,. is
1 1
1
------NN-IN-2

1

N-(n-1)

Since the number of ways in which these n units can be realized is n!, the
probability of obtaining the sets as sample is
n!-1 _I_ _!__,
1

N N-1 N-2 N-(n-1)

which reduces on simplification to[:f'
Therefore we infer that the sampling. mechanism described above will
implement the simple random sampling design.


Equal Probability Sampling

ll

2.2 Estimation of Total
The following theorem gives the first and second order inclusion probabilities
under simple random sampling.

Theorem 2.1 Under simple random sampling, (a) For i
. .

(b) For r, 1 =1, 2, ... , N,

tc;j

=1, 2, ... , N , 7t; =~
N

n(n-1)

=--N(N -1)

Proof By definition


=~P(s) =~(NJ-1

1C;

S)l

Since there are

( Il
N~

N

l n-1

n

I

-1

n

S)l

[ N-1~I subsets
n-1)

with i as an element, the above sum reduces to


n
which is equal to-

N

Again by definition, we have

=~-P(s) =~.(NJ-1
Since there are rN-2] subsets with i and j as elements, we get
=(N-2INJ-1
1Cij

.ur.J

ur.J n

n-2

n(n-1)

1tij

n-2

n

N(N -1)

Hence the proof. •

The following theorem gives an unbiased estimator for the population total and
also its variance.

Theorem 2.2 Under simple random sampling,

Y.m

= N L,Y; is unbiased for

n IES
.

.
. V[Y. ]--N2(N-n)S.2
.
the popu 1auon
tota1 and.Its vanance
1s
srs
Nn
y

where

N

s

2
y


1-~[Y·
=N-1£.,. I

-fl 2

i=l

Proof We have seen in Chapter 1. for any sampling design
A

yfff

~

y.

=£.,.-'
ies 1t;

is unbiased for the population total with- variance

(2.1)


12 Sampling Theory and Methods
y ]2
:L :L !!_ __L
N


N [

1C·

i=l J=l
i
(1C;1C 1 -rei})

(2.2)

1C·

J

I

n

=N

By Theorem 2.1, we have 7t;

n(n-1)

= N(N _ 1)

and rcij

Substituting these values in (2.1) we notice that


N""'

(2.3)

y liT = - ""-" Y;
A

n.IE.S

is unbiased for the population total.
Note that
nn(n-1)
1C·1C. -1C·· = - -

.,

I

J

N2

I)

N(N-1)
·rt~

~


N [

n N -n
N N(N-1)

=

N-.,

(2.4)

]2 (Y; -Y1 ) 2n(N-n)

Therefore by (2.2) V(Yfff) = L L -.,
i=l J=l n-

N(N -1)

i
N N

=

N-n LL(Y;-Yi)2
n(N -1 ) i=l j=l

(2.S)

i

N

We know that

N

N

N

N

L L,a;1 = L,a;; + 2L L,aij , if a;1 =a
i=l j=l

i=l

ji.

i=l j=l
i
Using the above identity in the right hand side of (2.S), we get

VCYfff)=

=

N-n


2 n(N - 1)

N-n

2n(N -1)

{±±i=l j=l

i=l

{2ffr? -2ffr;r1}
. I j=
. I

I=

. I . I

I=

j=

= N -n {±NYl-NHj2}
n(N -1)

i=l

N
2

= N(N -n) L (Y; - f)2 = N (N -n)

n(N -1)

i=l

Nn

s;

(2.6)

Hence the proof. •
The following theorem gives an unbiased estimator for the variance obtained in
Theorem 2.1.


Equal Probability Sampling
A

A

Theorem 2.3 An unbiased estimator of V(Y_rr.S') is v(Ysr.S') =
where

s;

A

A


Nn

.,

s;
-

s -~ .

is the sample analogue of

0

N 2 (N-n)

13

2

2

Proof Smce V(Ysr.r) = E(Y.S'r.r)- Y ,
"2
we have E(Y.S'n)

=V(Y.S'n) + Y 2 = N
A

The sample analogue of


2 (N -n)

Nn

s; s; n~

1 L[Y;-

=

is

2

Sy + Y

2

Y] 2

iE.S'

(2.7)
~

1

whereY =- LY;
n.

IE.S'

2
~2 }
= -1- { ~
~Y; -nY
n-1
iE.S'

=1- {

n-1

~
r? -n[r.S';..S'
J}
~
N2
IE.S'

Taking expectations on both sides we get

E1 [

. .Imp 1.IeS ,
Th IS

n


]-n&f2J}

= n-1

N~
1=1

n [
= n-1

N~

=__!!_[
n-1

N-1 s;-N-n s;
N
·
Nn ·

-__!!_[
n-1

n-1 s2\' ]
n -

N

1 N


Y-12 -n{N-n S 2 +f 2 } ]Nn Y
y.2 _ { N- n S 2 + y2 }

Nn

1

N 2 (N -n) S 2v ]
Nn
·

Y

J

J

=s2·,.

=[N

(2.8)

2 (N -n)Js2

Nn

y


Hence the proof. •

TMorem 2.4 Let (X;, Y;) be the values with respect to the two variables x and
y associated with the unit having label i. i = 1, 2, ...• N . If

X = N LX; .
n IE.S'
.


14 Sampling Theory and Methods

y = N ~ Y;
n ~

N

=-1 -~ (X, -X){Y; -Y), then under simple random

and Sry

N-1~

ie.s

t=:=I



A


[N (NNn-n)J S xy ·
2

A

sampling, cov(X, Y) =
...

....

Proof

....

....

....

....

cov(X,Y)=E[X-E(X)][Y-E(Y)]
A

A

A

A


= E[X Y]- E(X)E(Y)

=

J(:1
&..l )

2

I,x,I,r,]-xr
IE.S

IE.S

~

2

~

-- ( N ] E ~I
Y· X·I + ~I
Y· X J· - XY
n
.IE.S
.I,JE.S
.
i~j

n

n(n-1)
( ]2{-LY;X;
+
LY;X
N .
N(N -1) . .

N
= n
N

N

N

1=1

I~}

N

N

N

j

}

-XY


(2.9)

N

It is well known that LY; L Xi = LY; X; + L LY; X j
i=l
N

Hence

j=l

i=l
N

i~ j

i=l

N

N 2 Y X-~
Y·X·I = ~
~ Y·X J·
~I
~~I
i=l

i=l


(2.10)

i~ j

Substituting (2.10) in (2.9) we get

2 [~]fx;Y; +[ n(n-_n ](N 2f
cov(X,Y>=[NJ
n
N
N(N

=[N]2[~][tN

n

=N

2

X- fr;x;]-rx

1)

i=l

n(n-1) ] f X;Y;

N(N-l)


i=l

N

(N -n) _l_L(X; -X)(Y;
Nn
N -1 i=l

+[

i=l

n(n-1) ](N2f X ]-N2f X

N(N-1)

-Y)

Hence the proof. •
1

A

A

Theorem2.5Undersimplerandomsampling sry =--~(X; -X){Y; -Y)
n-1~
IE.S



Equal Probability Sampling
1

N

is unbiased for S , =-~(X -X)(Y·
I
N - 1 £..J I
x.'

-Y) where

1

X = - ' X;

n £..J

15

and

~~

i~

1
~
Y=-IY;.

n.
lEI

AA]
1 [ LX;Y; -nXY
Proof sx:'. = n-l
lEI

=1- ['x.y.
£..J I I
n- 1

_ _!!_,
1£..J
.,£..J x.~y.
I

n-

iEI

IE.f

iEI

1
~ ~ y.x.
+£..J£..J
_..!_ 'y.x.
=-- 'x.y.

I
I
I
I
.
n-1 £..J
.1E1 I I n £..J
iE1 1:;:. j
iEI
jE1

=-·-

n- 1

n-1'
I
I
.£. y.x.
IZ

.
lEI

~ ~ y.x.
_..!_ £..J£..J
1
I
.
IZ .

lEI 1EI

;:;:. j

Taking expectations on both sides. we get
N

n-1 n ~

1

n(n-1)

E[sxvl = - - ---£..J Y;X;-

n-1

N

n

N N

~~

£..J £..J Y;X ·

nN(N-l)i=lj=l

1=l


1

j-:;:.1

1

1

., __

]
N
[ N-Yx-'Y·X·
N
=-'Y·X·~ I I
N ~ I I N(N -1)
1=1

.

1=1

]f -[_!!_]v x

=[-

1
1+
Y·X ·

N N(N-1) i=l I I

=-

1-

[f

Y; X; -

NY X]

N-1

Hence the proof. •

N -1 i=l

A

~

Remark 2.1 If Y

Yr

=....!...!_
N

...


then under simple random sampling Y is unbiased for

the population mean and its variance is N - n S ;-.
Nn


16 Sampling Theory and Methods
This remark follows from Theorem 2.2.

2.3 Problems and Solutions
Probkm 2.1 After the decision to take a simple random sample had been made.
it was realised that Y1

the value of unit with label I would be· unusually low

and YN the value of the unit with label N would be unusually high. In such
situations, it is decided to use the estimator

..

Y + C if the sample contains YN but not Y1
.:..

Y

A

= Y - C if the sample contains Y1 but not YN


.

Y for all other samples
where the constant C is positive and predetermined. Show that the estimator

Y• is unbiased and its variance is
~.
N-n S,.
2C
V(Y ) = - [ - 2· ---(YN -Yt-nC)]

N

n

N-1

Also prove that V(Y.)n
Solution Let !ln = {s I n(s) = n} Partition !ln into three disjoint subclasses as

!l 1 ={s I n(s) =n, s contains I but not N},
!l 2 = {s I n(s) = n,s contains N but not 1}
and !l3 = !ln -!ll -!22
It is to be noted that the number of subsets m
respectively

rN- J.[N2
n-1


2 ] and
n-1

[N]_jNl
n

!l~o!l2

2 ].

n-1

Under simple random sampling

E(Y.)=

L y• (N)-1

seDn

n

=(N)-l{ I [r +C]+ I [r -c]+ Ir}
n

seD1

,\'E D2

sell3


=r:rL~n r+Cr:~~J-~c~~J}
=(N)-l I,r =Y
n

seOn

(refer the remark 2.1)

and !l3 are


Equal Probability Sampling

17

Y• is unoiased for the population mean. The variance
is V(f.)=. 2, [r• -Y ] 2
(NJ-I (by definition)

Therefore the estimator
oftheestimator

r•

selln

"

=(NJ-I( L [Y. +c-r ] 2+I [r -c-r] 2+I [r -r ] 2J

se~

"

se~

=.(NJ-1[ ~. [A

Y-Y

N-; +rN-;
l

Note that

n- (

n-

N

=

~~

]2 +~' [AY-Y ]2 +~' [AY-Y]2

2n(N -n)
N(N-1)


. Funher it may be noted that all the

"
members of .0 1contain the unit with label 1, (

(N-2·]
n-1

N-Jl of them contain the units
n-2

with labels j (j = 2, 3, ... , N -I) and none of them contain the unit with label N.
Therefore

~)r-Y]=
sellt

L r- Ir

sell}

.fElll

=.!_[(N-2\,I +(N-3~y. -(N-2\,]
n n-1 J
n-2 )f::2 n-1 J
1 (N-2~ Y + n -1 :I,r.
N-1 }- (N-2}'
=n n-1
N- 2 . 2

n-1
1

1

1

1•

1

Proceeding in the same way, we get

~)Y -Y) =.!_(N-2~YN
+ n=l rrj}-(N-2r
n n-1
N 2 · 2
n-1
n

(2.12)

1•

se.a•2

It can be seen that

~=
CJ

n

N-n
N(N-1)

Using (210)-(2.13) in (2.9) we get

(2.13)


18 Sampling Theory and Methods
.!..,.
-n [S;· ---(YN
2C .
V(Y
) =NN
n
N-1

-Yt -nC) ]

(2.14)

which is the required result.

V(Yl=

N;n[ s!]

(2.15)


Therefore

~. )..:.
[2C
V(Y
-][YN -Y1-nC ] >0
N-1

(comparing (2.14) and (2.15))

:::) YN - Y1 > nC (when Cis positive)

=> O
:yl]

Hence the solution. •

Problem 2.2 Given the information in problem 2.1, an alternative plan is to
include both Y1 and Y8 in every sample, drawing a sample of size 2 from the

.

units with labels 2. 3, .... 7, when N=8 and n=4. Let

.

Y;


be the mean of those 2

~ Y1 + 6f., + Y8
units selected. Show that the estimator Y' =
is unbiased for the
8

.

9V(f.,)
.
.
.
popu1auon mean wnh vartance
- .
16

.

..:. Y1 +6Y.,.
So lut1on
Y'=
8

+Y

8

= Y1


+Y + [6]
- -1L Y·
8

8

8 2 .

I

IES

Taking expectations on both the sides we get

E[Y']=[Yt~Yg HiH~I;Y;]
where I; = 1 if iE s
= 0 otherwise
Since E[/;] =~.we get from (2.16)
6

Hence the solution. •

(2.16)


×